Visit Heroix at http://www.heroix.com
Subscribe to the Heroix eNewsletter
Visit Heroix at http://www.heroix.com
Charting Life in the IT Environment

>> Verify Web Applications Working, Not Just Available

by Dave Atkins on June 17, 2009

Fewer things are more frustrating than learning–from customers or senior mangement–that your web site or service is “down” when you did not know it. It invalidates all the reporting you’ve done in the past about “uptime” and calls your competence into question.

When a company is expanding rapidly…or a new service is being launched, people understand there can be some hiccups, but they need to know your and your company are working to fix the problems. When you can say, “yes, we are aware of that and addressing it right now,” or “we had a problem for 15 minutes last night at 3:15am and took the following steps…” it boosts your personal credibility and the credibility of your company. It’s not just “covering yourself;” it’s part of the nature of rapid growth and uncertainty. You cannot eliminate risk, but how you mange it demonstrates what customers can expect from you in the future.

So, if you just have ping monitors set up that tell you all your servers are pingable from another machine on the local network–and then email you an alert if somethings goes down…ask the following questions:

  • What happens if your monitoring server can’t connect to the mail server to send the pages or email alerts?
  • What happens if your database server becomes overwhelmed with connections and can’t serve the web application?
  • Is there a single point of failure you are not monitoring?
  • If you have load balanced servers…what happens when a node has a performance problem but does not completely fail?

But those are just questions to start you thinking. What you really need to do is stop thinking about monitoring in terms of point tests and start thinking in terms of functionality. Start from use cases and then develop a testing plan that goes deep enough into the detail to monitor these user transactions with your business:

  • Can a visitor from outside our firewall successfully log in to their account on our service?
  • Can they navigate to particular services that are only visible to logged in users?
  • If we are sending automated updates by email…do those emails actually get sent?
  • Are new visitors able to register new accounts? Are we actually seeing a reasonable number of new members joining every day?

To answer these questions, you need to monitor transactions and construct correlated events.

Correleated events means you monitor a series of dependent transactions…the simplest example is to set up an external monitor of your site, sitting outside the firewall that would test the following sequence of transactions and then alert you with only the first failure:

  • Is the public router pingable?
  • Can I ssh into a particular server?
  • Is the web service responding at all?
  • Is a web page delivering expected content (versus an error message or long response delay)?
  • Can I log in to a particular part of the web site?
  • Does a database query indicate we are recording an expected number of new registrants or tracking database rows based on some minimal user activity?
  • Do the smtp logs contain indications that emails are being sent every day?

Transactions are the detail of how you get beyond the simple status of “pingability.” I’ve talked about using web content monitors before, but if you have more granular testing capabilities you can use transaction monitors like the following ones in the Heroix Longitude product to answer those questions:

  • Use the FileContent monitor to inspect a log file. There is a wealth of information you can collect by monitoring syslogs and Windows event logs, but for a specific functionality test, you can focus on searching for a specific phrase in the logs that indicates a nightly process completed successfully.
  • Use the SQLQuery monitor to count rows in a table or report the most recent new row added to a table. How many new shopping carts were created in the past 24 hours? What is the current maximum userid?

Ideally, these kinds of tests should be a part of your development process–don’t stop at QAing a build, find out what daily tests should be done to ensure the application if really working. Make ongoing testing and verification a part of your deployment process. Recognize that the process needs to be constantly improved–although you can start by monitoring each node in your network based on a template of what should be monitored on that server, you should then step back and design some functionality-based monitors to monitor your applications from the perspective of users who don’t know or care what a ping is.

Share this post:
  • E-mail this story to a friend!
  • StumbleUpon
  • Digg
  • del.icio.us
  • Facebook
  • LinkedIn
  • Google
  • Furl

[Post to Twitter] 

No Comments »

No comments yet.

RSS feed for comments on this post. RSS must be enabled on your computer.

TrackBack URI

Leave a comment

© 2010 Heroix | Heroix | RSS | Privacy Policy | Email: info@heroix.com