Blog

5 Tips for Successful Server Monitoring and Alerting

April 03, 2017 | Heroix Staff

As an IT Infrastructure grows in size and complexity implementing server monitoring and alerting becomes increasingly difficult.  Mixes of different operating systems, network equipment, software versions, along with virtualization and cloud computing make it even more complicated.

Fortunately, regardless of network size or complexity, using these tips can help your organization fulfill its server monitoring objectives.

1) Get Organized: Your server infrastructure addresses the needs of a variety of end users, each with a different narrative. It is a good practice to organize your alerts to IT and the end users based on how the resources are consumed. For example, you can alert by business service, application, on-premises verses cloud, physical verses virtual, operating system, and more. Alerting based on how server resources are “viewed” or “consumed” allows IT and end users to be well informed, providing everyone an accurate and objective assessment of what is going on. An ever-present challenge for IT is battling perception verse reality, so communicating accurate and timely alerts are critical to maintaining a positive IT reputation.

2) Cast a wide net:  You’ll want to start with monitoring critical server performance metrics, giving closed attention to both physical and virtual performance indicators.  Lowest common denominator metrics related to Memory, CPU, Disk, Network represent a good start. Remember, server monitoring isn’t just about performance, there are a number of operational and availability metrics that also warrant close attention. For example, is DNS functioning properly, are their errors in log files, are the servers not only available (i.e. pingable) but more importantly are they responsive?

Application specific performance metrics have to enter into the equation as well as they are dependent on and affect server performance.  Metrics related to web, database, and messaging applications should be given close consideration.

3) Be proactive:  Being proactive means not only being timely in terms of knowing about a problem, but also being timely in terms of correcting or mitigating a problem.

To be proactive, the IT professional charged with monitoring must be alerted to system and application problems in real-time. When possible make sure alerts (whether email, SMS, or a dashboard display) are initiated early in the problem cycle. Be conservative with your thresholds, and don’t be afraid to escalate the notification if a problem accelerates or becomes persistent.

Problems that are identified early can often be corrected or significantly mitigated with automation. Executing scripts or OS commands to fix problems and then escalating based on their success or failure greatly improve service levels and customer satisfaction.

4) Target the alerting: An issue which we hear daily on our forum. End users who are sick of mindless alerting, which eventually develops into a “Boy Who Cried Wolf” situation. Too many false alarms cause those in charge of monitoring to ignore alerts in general.

Targeted alerts means setting thresholds so that only actionable alarms are triggered. Alerts must be meaningful within context of the particular system or application. For example, a good practice is to work off of a baseline and consider having the minimum, maximum, and average values readily at hand when setting alerts for performance based thresholds.

5) Leverage Alerting to show IT’s value: Ultimately, IT only has value if it has value to the business.  IT can show operational excellence via alerting that integrates with reporting and graphing. The ability to produce IT related documentation quickly helps show the positive impact IT is having on an organization's productivity, on-time delivery, and quality.

Reporting is only useful when the information is disseminated on a timely basis.  Automated report generation and sharing gets the right information to the right people when it is most useful to them.  For example, reports that show alerts and time to resolution provide an objective view of what is actually happening within the server infrastructure and eliminate subject perceptions that can be harmful to IT’s reputation.

Want to learn more?

Download our Best Practices for Server Monitoring Whitepaper and learn how to achieve a successful long-term server monitoring strategy by focusing on an approach that is lightweight, efficient, resilient, and automated.

 

Download the whitepaper: Best Practices for Server Monitoring