Windows Server is generally considerd to be a mature and stable operating system platform. With that said, it is no secret that the applications that are part and parcel to Windows Server deployments can compromise server performance if there is a mismatch between resources required and the resources available from the server operating system.
In addition, the complexity that accompany virtualization and cloud computing has made monitoring Windows Server performance an ever more critical component to any successful IT monitoring strategy.
What attributes lead to the effective and efficient monitoring of your Windows Servers? How do you ensure that your monitoring implementation will be fast, easy, accurate, and won’t consume an inordinate amount of IT personnel and server resources?
Agentless monitoring means that you don’t have to install software agents on the Windows Servers - resulting in simplicity and savings due to faster deployment, lower software license fees, and streamlined operation.
In the past, agent based technologies were a prerequisite to any successful server monitoring strategy because the mechanisms simply did not exist to remotely collect the metrics required to effectively monitor Windows Server performance. However, with today’s agentless technologies like WMI and Windows PowerShell there are no limitations in terms of the breadth and depth of key performance indicators (KPIs) that can be collected.
Agentless deployment is fast! - especially when automated server discovery is in place. The ability to readily discover and monitor Windows Servers across multiple domains, local or cloud based, and trusted or non-trusted delivers a huge win to IT in terms of faster implementation time and added simplicity.
Server discovery along with agentless monitoring is especially beneficial in IT environments that exhibit volatile behavior. Let’s take the example of an AWS or Azure environment where auto-scaling is implemented - the server count is going to constantly change. The last thing IT wants to do is deploy additional monitoring code on the servers or keep track of the servers coming and going. Performance monitoring is much easier when servers are discovered and immediately monitored.
Intelligent Monitoring and Alerting
Whether your Windows Servers are on-premises or cloud based, whether they are perpetual servers or launched for a specific workload, whether they are serving applications or supporting the IT infrastructure – when it is all said and done there are always going to be a core set of Windows performance and availability metrics that have to be monitored.
The most straightforward examples are the lowest common denominator KPIs related to CPU, Memory, Disk, and Network. These metrics certainly seem simple enough, especially when we monitor them as part of a specific workload. However, things get a bit more complicated when virtualization and cloud computing enter into the equation - where a server's allocated resources can change in an instant.
Server performance monitoring is much easier when there is an automated process in place that intelligently collects and evaluates server performance. Having a built-in knowledge base that knows what KPIs to collect, how often to collect them, and how often to evaluate them goes a long way towards making the monitoring of server performance easy!
The collection and aggregation of Windows Server KPIs is a critical element of any Windows performance monitoring strategy. However, the real value, the difference between simply “performance monitoring” and “performance monitoring made easy” is how the data is processed and evaluated once it is collected.
Let’s take the straight forward example of evaluating CPU usage on Windows Servers.
- Longitude collects and calculates Windows CPU consumption every 5 minutes, it does this by looking at the total CPU time used over 5 minutes. For example; if 2 minutes 30 seconds of CPU time are used over 5 minutes Longitude calculates a 50% busy value. (note: Longitude will normalize across multiple CPUs)
- However, Longitude evaluates Windows CPU usage for alerting every 15 minutes
The technology is going back and evaluating for a CPU consumption problem based on the average CPU usage calculated across the three (5 minute) collections gathered over the previous 15 minutes. This methodology means transient CPU spikes get collapsed and false positives are avoided.
Keep in mind that different KPI's are collected and evaluated at different intervals of time based on their criticality.
The goal is to constantly evaluate server behavior, determine if any problems warrant IT’s attention, and then alert staff only when necessary. Keep in mind that too many alerts can be worse than no alerts at all - as time, effort, and other valuable IT resources have to be expended to configure alerts only to have the them ignored!
An important component to the performance monitoring value proposition and “making it easy” is how the Windows Server KPI’s are presented and leveraged, especially as it applies to reporting.
- Automatic Report Generation – Windows Performance Monitoring is most useful when the KPIs are made readily available to IT organizations and their constituents. This means automating the generation and delivery of the reports.
Reports that are readily accessible via email, dashboards, or publishing to web portals make consumption of the data quite easy. For example - if operations, development, or even end-users need to understand how their IT environment is functioning it is important to make that information readily accessible. Automated reports are a win/win as they often show the value of IT and validate the role IT plays in an organization.
- Data Visualization and Manipulation – Server monitoring is much easier when users can “self serve” report generation. Allowing users to interactively generate their own reports and graphs as well tailor the format and summarization means that IT can focus their efforts towards initiatives that can be more impactful to the organization.
- Role Based Security – Windows Performance monitoring can also be made easy when reports are generated and delivered based on the recipients role in an organization. For example, someone who is exclusively consuming on-premises resources doesn’t need access to the KPI's associated with cloud based resources.
- Define and Report Service Level Agreements - SLAs define the level of service that IT is delivering to its customers. SLAs clearly define what constitutes acceptable (and for that matter unacceptable) server behavior. A properly designed SLA aggregates a defined set of KPIs into a single measurable entity. Each KPI is defined with thresholds for good, degraded, and unacceptable behavior.
Excerpt from a Longitude Summary SLA Report
SLAs are an important part of any IT organizations Windows Performance monitoring repertoire. They make performance monitoring easier by objectively measuring server(s) health based on multiple KPIs.