Blog

SLA Monitoring for Distributed Applications

August 26, 2015 | Heroix Staff

What are distributed applications?

A distributed application is any application in which the components are distributed across multiple devices.  For example, a web based application’s components could include multiple web servers, backend databases and networking devices.  For the application to function at an acceptable level each component of the application must function at an acceptable level.

The first step in monitoring a distributed application is to break it down into its components.  Components can be entirely on one server or distributed across multiple servers in a network, and can even include cloud or internet based resources. The performance of each component and the speed and integrity of the connections between components has an effect on the application as a whole.  You need to be able to differentiate between an application hanging because a SQL Query took too long to respond, which you can troubleshoot, and a remotely hosted javascript taking too long to run, which is out of your hands.

Why use distributed applications?

There are some cases where it can be feasible to use a single computer but the use of a distributed system is more practical for larger and more business critical applications. For example, it might be more cost-effective to share the workload across a cluster of low-end computer hardware rather than investing in a single high-end computer. An application built on a cluster of systems is inherently more scalable and fault tolerant.

Distributed applications can also use the Internet and the Cloud for both application features and additional resources that might be impractical to provide locally.  Remotely hosted search engines, mapping utilities, file sharing, etc., can be seamlessly integrated into locally hosted web sites without the cost of developing, hosting, or maintaining these features.  The drawback to using remotely hosted features is that you have no control over their performance and need to monitor them to ensure that they do not create a bottleneck for overall application performance.

What is a distributed application SLA?

The basic concept of a distributed application SLA is monitoring the performance data for each application’s components as it pertains to the performance of the overall application.  So, for a web server, there would be a test on how long it takes for a page to load, or for a SQL server, how long it takes to respond to a query.  The level of detail can range from basic user perspective (e.g. the web page must load in less than 2 seconds), to more in depth performance metrics (send an alert if any SQL Server disk queue length is > 2).

What are the parameters of distributed SLA monitoring?

Longitude SLA’s have three states:  Good, Degraded, and Unacceptable.  Each performance measure is given thresholds that indicate the transition from "good" to “degraded”, and from “degraded” to “unacceptable” performance.  For example, if a ping takes longer than 1500 ms to return that component may slip from “good” to “degraded”.  If it takes longer than 3000 ms it becomes unacceptable.  In terms of user experience this can translate to waiting a few seconds for a page to load (good), vs. waiting a minute (degraded), vs. the browser timing out (unacceptable).

Each component’s state is then color coded and stacked together in a real-time chart that provides a visual correlation between components.  The underlying network congestion that is causing pings to slow down will affect other network dependent components and will be visible over all components affected by network congestion.  This visual correlation can be used to quickly pinpoint the root cause of performance issues in an application.

Why do businesses need distributed application SLA monitoring?

Any enterprise with contractual performance obligations involving service level agreements needs to ensure and document compliance as well as reasons and causes of noncompliance. Correlating user experience measurements with the infrastructure of the distributed network can provide vital indicators affecting contract fulfillment as well as future business. To name three:

  • Application availability
  • Long term application performance
  • Root causes of outages or performance degradation affecting end-users

How does the Heroix Longitude SLA dashboard work?

Longitude’s SLA Dashboard gives real time displays of SLA performance as well as:

  • Drilldown for investigating compliance shortfalls
  • Historical reports
  • Ability to comment on extenuating circumstances or actions taken to resolve the problem identified
  • Tools for documenting changes in IT infrastructure and measuring the effects on SLA performance

Is Heroix Longitude right for your business?

If you are looking for affordable, easy-to-use performance monitoring software that works across Windows, Unix, Linux, VMware, Hyper-V and network devices, and can keep your distributed system compliant and working, our product and support maybe for you.

Want to learn more:

Download a FREE trial of Longitude  - Stand up Longitude in just minutes and immediately start seeing how your environment is performing, receive proactive alerts, reveal root causes, automate corrective actions, and more...

 

Start Your Free 30 Day Trial of  Longitude Today!