IT is tasked with resolving a variety of server performance issues; the techniques used to address the problems vary greatly depending on the nature of the issue.
The challenge IT organizations have is ensuring that they have the right methodologies in place to quickly identify and mitigate performance issues.
When there are application issues, server performance is almost always the first place to start. How IT goes about analyzing server performance depends largely on the type of problem and where they are in the problem cycle.
If an IT professional could be granted one superpower they'd likely choose the power to predict the future. Nothing is more frustrating for both IT and their users than to experience avoidable server performance issues.
The trending of server performance data shows the general characteristics of the data values in terms of increase, decrease, a regular pattern, or a constant behavior.
In its simplest form trending is about gathering and analyzing server performance metrics and using the trending information to:
Identify server performance characteristics that although not immediately harmful, if left unchecked will lead to server performance problems
- Predict server usage so that IT can project when and what server resources will reach depletion.
The need to anticipate changes in server requirements is essential not only because IT needs to provision server resources on a timely basis, but also so that IT's impact to the budget and CAPEX is foreseeable - with no surprises. IT simply cannot afford to be behind the budgetary curve, ad hoc purchases wreak havoc because they can potentially drain funding from other initiatives.
Consideration must also be given to factors beyond server performance. For example, an incremental increase in users, followed by a rapid increase in server resource utilization would indicate that the application warrant closer scrutiny. Application performance issues can be easily masked with more, bigger, better, and faster servers, however it is entirely possible that those same issues can be resolved more cost effectively with just a bit more analysis.
Here a server’s CPU usage is trending up at rather alarming rate, so IT would certainly want to get to the bottom of this before things get out of hand.
The volume of data generated that shows server performance and availability is quite substantial. Ultimately it is about how to make the best use of that data.
This whitepaper helps IT achieve successful long-term server monitoring by focusing on an approach that is lightweight, efficient, resilient, and automated.
One of the tenets of server performance analysis is that there has to be a fundamental understanding of what constitutes a “normal” workload. Servers be they physical or virtual are sized differently and have diverse roles. There is usually little or no consistency between servers as to how resources are consumed and where or why performance degradation occurs.
Baselining is the understanding of a server’s role and tracking when/how it is being utilized. For example - Is the server a database server that provides services for after-hour batch processing? The baseline characteristics will vary depending on how and when the server is most utilized. Some servers may experience “usual” changes in workload based on time-of-day, day-of-week, or week-of-month and we want to make sure to account for the volatility.
Here is the baseline for a server’s CPU usage over 30-days. The analysis is showing the average CPU usage per hour over the time period. This workload demonstrates a consistent pattern of the ramping up of CPU, a slight decrease at noon, and then a gradual decline of CPU over the work day.
If there were high CPU Usage outside the normal workday then this would readily be identified as anomalous behavior.
Server performance analysis should never be done in a vacuum. Often there is the need to correlate server performance and end user response time. The challenge for IT, is how to best identify where the problems are in the technology stack and how those problems will impact end users.
How many times have we heard, “it’s the server.. no it’s the database.. no it’s the network”?
Service Level Agreement Monitoring and Reporting helps IT to correlate performance and availability data across the entire technology stack. Correlation is accomplished by targeting key performance metrics related to servers, storage, applications, network, as well as end user response time. For each KPI, thresholds for good, degraded, and unacceptable behavior are mapped out.
This summary report is correlating the time periods where there is a degradation (the yellow) in performance. We can see a correlation between the response time of the web application and the database response time. We also see that the targeted resources for server, network, and virtualization are all within their acceptable baselines. The next step to resolve this problem is to investigate what is happening inside of SQL either by digging in on your own or leveraging a SQL performance monitor.
The value of server performance data is unlocked when it is properly interpreted from its raw form. A good first step is to identify the server and the key performance metrics to trend. The trends are either going to reassure IT that everything is OK or they're going to enlighten IT to future problems, either way it is a win for IT. An IT department with no vision into the future is setting itself up for unnecessary headaches and angst.
Reporting and baselining server performance is a critical tool in IT's arsenal because it enables IT to identify outlier behaviors - which are usually indicative of issues that requires closer attention. The data from server performance analysis should also be leveraged beyond reporting. For example, it is a good practice to set up alerting to operate off of baseline values so that IT doesn't receive alerts for known a known pattern. A simple example: It is unnecessary to alert on high disk IO in the evening, because a backup is running.
Correlation is the analysis of server performance, application performance and availability data together to form a cohesive picture a "Service Level", correlation is an enormous help for both for diagnostics as well as showing IT's value. Being able to identify where a problem is in the technology stack, how that problem affects other parts of the stack, and ultimately end user experience saves organizations time and money. Service Levels are integral to documenting the value of IT by objectively reporting on performance across the technology stack.
Want to learn more?
Download our Best Practices for Server Monitoring Whitepaper and learn how to achieve a successful long-term server monitoring strategy by focusing on an approach that is lightweight, efficient, resilient, and automated.