Visit Heroix at http://www.heroix.com
Subscribe to the Heroix eNewsletter
Visit Heroix at http://www.heroix.com
Charting Life in the IT Environment

>> SLAs continued: Define the business service

July 17, 2007

In my previous blog entry on June 18th I began talking about defining SLAs. Here I continue that conversation.

When you define an SLA, there are two steps - the first is to break your distributed application into discrete components, and the second is to define acceptable performance levels for each component. While the first step is usually straightforward, (SQL on one server, IIS on another, SMTP on a third server, etc.) the second step can be more difficult.

The Longitude SLA has three levels of performance – acceptable, degraded, and unacceptable. Degraded is actually a subclass of acceptable; while an application is degraded, it is still technically performing acceptably, just not optimally. These three levels of service are incorporated into SLAs when you define the compliance information – there is a “Required percent of time in acceptable state” (which is acceptable plus degraded), and “Required percent of time in good state” (which is only acceptable). So, you need to determine when a particular component is working, but not optimally, versus when a component is just not working acceptably at all.

Determining the thresholds to use for degraded and unacceptable performance levels can be simple. In the definition of the service conditions, some metrics have suggested, best practice degraded/unacceptable thresholds (e.g. CPU Busy Time or Free Memory in the Windows or Unix applications). However, many metrics do not have suggested SLA thresholds, and user defined thresholds are needed for the SLA to make sense.

Not all metrics require thresholds, though. Transactions have discrete Fail/Succeed states that can be used in SLAs. For example, if a ping succeeds, the SLA is acceptable. If it fails, then the SLA is unacceptable. But transactions also collect transaction response times, and this can be very valuable information for an SLA. For example, the round trip time recorded in a ping transaction is a good measure of network latency, or the time it takes for a SQL Query transaction to complete is a good functional measure of database responsiveness. In these cases, SLAs become much more detailed if you assign degraded/unacceptable thresholds to transaction response times rather than just using default fail/succeed values.

The question then becomes - how do you determine a good threshold is for a transaction component in an SLA? Check back soon to learn more about defining SLAs.

Posted by Susan Bilder, Senior Technical Consultant

No Comments »

No comments yet.

RSS feed for comments on this post. RSS must be enabled on your computer.

TrackBack URI

Leave a comment

© 2009 Heroix | Heroix | RSS | Privacy Policy | Email: info@heroix.com