Monitoring the Cloud for End User Experience

June 18, 2014 | Heroix Staff

Using the Cloud for all or part of your computing infrastructure doesn’t mean you can ignore monitoring. If you’re using Cloud based SaaS applications, or you have web applications hosted in the Cloud, you still need to verify that those resources are available and responsive. This doesn’t mean you have to do a deep dive into DevOps optimization – but you should verify the applications are performing for your users.

From the perspective of a Cloud user, how often a backend server needs to be migrated, or when a noisy neighbor slows an application down doesn’t matter. The Cloud obscures the details of the problem, and the user just cares that your web page took longer to load than they were willing to wait – and, oh, look – another cute kitten video.

The minimal areas you should monitor for Cloud performance from a user perspective are:

    • Verification of SLA agreements
      We’ve discussed Cloud SLAs before, and pointed out that the compensation many vendors offer is typically not enough to offset losses and is only available if you notice the outage, plus excludes maintenance windows. If your application needs to be available 24/7, you should be checking that you can access it 24/7, and check with the vendor if it’s not available when it should be. And, of course, documenting your outages so you can cash in on the SLA agreement if needed.
    • Application responsiveness
      SLA agreements only refer to uptime. If the application is available, but too slow for users to wait for, then it is effectively unusable. A responsiveness check should involve whatever functionality your application provides. If your users can log in, enter data, search, update records, etc – that is what you should be testing. You can create macros that will automate this, and then archive the data for trend analysis.
    • Optimizing resource reservations
      One of the draws of the Cloud is that you pay for what you use. However, if you reserve resources beforehand, you can pay a lower rate than you would for ad hoc resource requests. If you’re using IaaS to host your requests, keep an eye on the basic server monitoring metrics - CPU, disk, network and memory – and use your observations to fine tune the basic resources you request from the Cloud provider.
  • Pinpointing application problems
    Just because you can’t get to a Cloud application doesn’t mean it’s the vendor’s fault. The internet is between your users and the Cloud servers. If your DNS provider’s servers go down – or are attacked – users won’t be able to find your application. If your ISP has an outage, the application will be there, but users inside your organization won’t be able to get to it using your network. Or, the problem could just be that a switch or a router has died, or your network bandwidth usage is too high.Chart out the points of failure between your local network and your Cloud, and monitor them so that you can keep track of the cause of application failures. You can fix problems on your internal network but for external problems, keep track of when and where they occur, and of the vendor’s response.