As the name may suggest, Service Level Agreements (SLAs) define the level of service that IT is delivering to its customers. More specifically, an SLA definition can incorporate items such as application availability, application response time, the number of users that can be served, help desk response time, network availability, usage statistics and more.
A properly designed SLA aggregates all availability and key performance indicators (KPI’s) together into a single measurable entity. SLAs should clearly define what constitutes acceptable (and for that matter unacceptable) behavior in terms of application performance and availability.
It is an absolute necessity that all the components of a Service Level Agreement be measurable as being able to maintain historical context is essential to providing everyone with the information necessary to get an impartial assessment of SLA performance and compliance.
The value of SLAs
SLAs are sometimes viewed as holding IT’s feet to the fire unnecessarily, in order to ensure that they are delivering services at agreed upon performance levels. Nothing could be further from the truth, as properly configured SLAs are an invaluable asset.
SLAs provide the following benefits:
Accurate reporting of SLA compliance provides an objective picture of what happened and when. Nothing is more frustrating than exaggerated or inaccurate complaints that place IT in a bad light and make problem diagnosis and resolution more challenging.
SLAs demonstrate IT’s value to the organization by showing that the services delivered by IT are meeting or even exceeding performance expectations.
SLAs provide advanced warning. A properly constructed SLA not only measures application availability and performance for end users and customers, but it also provides metrics that IT can use to analyze the performance of the underlying IT infrastructure and application components. Properly collected and evaluated SLA data has the information needed to recognize and alert on problems before they turn into compliance issues.
Properly defining SLAs
IT organizations should implement multiple SLAs, as services often have their own set of unique set of requirements. For example, applications functioning in a hybrid or public cloud operate quite differently then those that are locally hosted -- namely the network components connecting the application to the user are going to be much more significant.
Ideally each SLA should include any combination of metrics that answer the following questions:
What percentage of the time is the service available?
The amount of time the service is “available” is usually the low hanging fruit and fairly easy to collect and measure. Availability is typically a heartbeat test: checking to see if the service and its components are alive. Testing can take the form of pings, port checks, launching URLs, or any other test that establishes the service is available.
How is the service performing?
Ultimately, an SLA determines how well a service is supporting business requirements. Measuring performance goes beyond simply testing for availability and is best accomplished by interacting with an application, for example, by using synthetic web transactions. It is important to verify both that the expected content is returned and that the response time is within acceptable limits in order to verify that the service is functioning and responsive.
What are the root causes of outages and performance degradation?
When it is all said and done, IT organizations strive for 100% SLA compliance. Ideally, IT would have advanced notice of issues and be able to correct problems before they could compromise an SLA.
By aggregating underlying IT infrastructure and application components of a business service into an SLA, IT can readily determine the root causes of performance problems. This information can then be used to provide the advanced alerts that allow IT to take targeted actions to keep IT services at peak performance.
Identifying the key metrics associated with a business service's IT infrastructure (physical, virtual, cloud), and with an application's performance (database, web, etc.) is an important first step, even more critical is establishing proper compliance thresholds for the key metrics. This means using historical context to establish proper baselines for good, degraded, and unacceptable behavior.
The goal is to alert IT that a critical component within an SLA is in a degraded state, giving IT enough time to address the problem before the SLA enters into an unacceptable (non-compliant) state.
Sample SLA Templates
Service Level Agreement templates should be constructed to provide the level of detail needed by the individuals consuming the information. For example , end users typically neither know nor care about the underlying IT infrastructure and application components.
(Figure 1. Longitude Summary SLA for end users )
End users care about response time and availability. The above summary from Longitude is showing the response time from a synthetic web transaction that is accessing a web portal every few minutes: "yellow" indicates periods of time where the response time as degraded, while "orange" indicates unacceptable response time.
(Figure 2. Longitude Summary SLA for the IT department)
The Longitude SLA template in figure 2 correlates across IT infrastructure and application components to pinpoint what is causing end user performance issues. We see here that there is a strong correlation between the responsiveness of the web portal and the database response time. IT now has an excellent starting point from which to diagnose the database issue.
Service Level Agreements are an important asset that should be a part of any IT organizations repertoire. In addition to providing accountability, SLAs are a vital tool that helps IT to:
Speed problem diagnosis and resolution
Separate infrastructure issues from application problems
Objectively measure application health and performance
Show the value of IT
Want to learn more?
Download our Best Practices Guide to Developing and Monitoring SLAs - Learn how your organization can minimize the resources needed for SLA management and more readily align ITs services with the needs of the business.