Blog

Making IT proactive – automated server monitoring and alerts

November 07, 2017 | Ken Leoni

IT must adapt it's approach to managing servers as technology continues to advance with virtualization, hyper-converged infrastructures, cloud computing, and other technologies.

Making IT proactive – automated server monitoring and alertsNo matter the underlying server technology there is one constant:  the need to proactively monitor for server problems and alert IT. 

Applications are invariably affected when servers encounter issues, which means IT operations needs to be alerted early in the problem cycle. 

Alerts can take any number of forms, such as:

  • Email
  • SMS
  • Real-time dashboard displays
  • Integration with 3rd party applications.

Problem mitigation is also an important part of the equation.  Taking quick actions to reverse or alleviate a technology failure can go a long way towards minimizing disruption to end users and improve service levels.

 

The Challenge

The challenge, especially with larger or more volatile IT infrastructures, is determining which IT components to watch, how often to watch them, and how to respond to the problems.  

Over the years the pendulum has been constantly swinging between whether the best IT professionals are generalists or specialists. CIOs commonly disagree as to the skill sets IT staff should have to provide the most value for their organization.  IT professionals have to maintain skill sets that meet both management requirements and IT administration needs.

The current trend that IT personnel are taking to satisfy their many requirements is a more overarching generalist approach. Whether they are working with on premises and/or cloud based IT infrastructures, the sheer volume of technology and the number of moving parts mandates that IT professionals can best serve their organizations when they have a working knowledge of multiple technologies as opposed to detailed knowledge of specific technologies.

This “generalist” approach makes being proactive especially challenging because of the necessity to cast a wide net over so many technologies.  Let’s take the example of an organization operating a hybrid cloud infrastructure. Hybrid cloud environments can make server monitoring problematic, as some servers are perpetual while others have limited lifespans, resulting in a volatile infrastructure that requires monitoring which incorporates hybrid specific criteria.

When we stack the added complexity of the myriad of applications - especially cloud based applications - on top of IT infrastructure things become even more interesting. Ultimately the only way to effectively absorb and leverage technology is through automation.


How do we get to Automated Server Monitoring?

Automated server monitoring is a term that’s often thrown about and can have multiple connotations. What is true automated server monitoring and how does it relate to IT?

Let’s take the example of a hybrid cloud infrastructure where servers are dispersed across multiple physical locations and running from disparate IT infrastructures.  What does IT require in order to effectively monitor this server infrastructure?

True automated server monitoring means

  • Automatic server discovery – Lookup and find the servers

  • Proactive server monitoring - Once a server is discovered, determine automatically what server components need to be monitored

  • Apply a built-in knowledge base - Understand the server components and apply appropriate thresholds for alerting



Automatic Server Discovery

Ideally IT organization should automate server discovery - so that from a central console they can readily monitor and manage servers across multiple domains, local or cloud based, and trusted or non-trusted. In addition, the discovery and monitoring process should be “atomic”, meaning monitoring starts as soon as a Windows or Unix/Linux server is discovered.

The discovery process itself should be agentless and generate little or no system overhead.  In an environment with multiple remote locations, such as data centers, it is a best practice to use a proxy at each location for discovery. Local discovery both minimizes network/system overhead and makes the discovery process itself more resilient to network interruptions.

 Automatic Server Discovery in Hybrid Cloud

Figure 1. Configuring Server Discovery in Longitude Enterprise Edition

Above we see Longitude Enterprise Edition configured to perform Windows server discovery inside of an AWS EC2 environment. In this configuration a lightweight process called the “Remote Statistics Server” is deployed on a single EC2 instance, where it is then tasked with performing a server discovery inside the EC2 environment.

 

Proactive Server Monitoring

Automation not only encompasses discovering which servers need to be monitored, but it also involves proactively determining what needs to be monitored on the servers themselves.  A fundamental goal of any server automation initiative should be to eliminate the need for IT to manually determine what needs to be monitored on the servers.

Running a manual inventory of servers and using that to determine what needs to be monitored is a flawed strategy:

  1. In environments where the server counts are constantly changing IT simply doesn’t have the time to track each server. Cloud computing makes this particularly challenging especially because of the autoscaling features built into technologies like Amazon Web Services, Azure, and Google

  2. Servers' attributes vary depending on their role in the organization,  rendering a one size fits all monitoring template unworkable. For example:

    • A web server’s hardware configuration is very different from that of a database or application server. The number of NICs, disks, CPUs, etc., will vary and virtualized servers can be dynamically altered later to accommodate changes in workload.  

    • Proactively discovering critical server specific components is also important as components will vary across the life of a server.  For example:

      • Windows services are a basic building block in the Windows Infrastructure. The number and names of services will vary depending on the servers' role, and services can be installed or removed as new software technologies are updated or newly deployed.

      • Filesystems on Unix servers can be mounted or dismounted as needed, and newly mounted filesystems should be monitored to ensure availability.

Being proactive means constantly looking for any modifications to the servers hardware or operating systems and adapting to those changes without the need for user intervention.

 Longitude Enterprise Edition automatic server monitoring

Figure 2.  Longitude Enterprise Edition - Automatic Server Monitoring

Above we see Longitude Enterprise Edition configured to monitor Windows servers 

  • The Collection column on the left represents the categories of operating system components that Longitude automatically discovers and monitors.

  • The Schedule column to the right indicates how often Longitude will inventory the server for changes and collect the key performance and availability data.

Note: for the purposes of this blog the focus will be on operating system metrics, but these concepts also apply to monitoring any IT infrastructure or application component (i.e. Active Directory, SQL Server, etc.)

 

Apply a built-in knowledge base

OK, so we've reached the point where we know which servers need monitoring, which operating system(s) they're using, and what needs to be monitored on the servers.

If our primary interest is to make IT proactive with automated server monitoring and alerting, then the next step is to apply a built-in knowledge base.  We want to automatically evaluate server behavior, determine if any problems warrant IT’s attention, and then alert staff when necessary.

Whether an environment has dozens of servers or thousands, leveraging a built-in knowledge base enables IT operations to be more effective and efficient. Manually determining what constitutes a problem on a server by server basis would take an enormous amount of time and effort and in all likelihood miss some issues.

 

Setting up Automatic Server Monitoring

Thus far we’ve broken automated server monitoring into 3 elements:

  1. Discover the servers that need to be monitored

  2. Determine what data needs to be gathered from the servers and collecting it

  3. Apply a knowledge base against the collected data and alert IT to problems


In a production setting you’ll want all 3 elements to execute “atomically”, meaning once servers are discovered the appropriate data is automatically collected and evaluated.  We want to minimize the time and effort required from IT to determine where and when problems have occurred.

Longitude Enterprise Edition enables IT to be proactive by automating server monitoring and alerting IT staff with a lightweight and efficient architecture.  Longitude is constructed to agentlessly discover and collect data from servers and works as follows:

  • Users connect to Longitude console Web UI and run a discovery.

  • Longitude collects performance data from the servers it discovers with queries that use a platform appropriate protocol, i.e. WMI for Windows or SSH for Unix/Linux.  Servers that are isolated by firewalls or in remote networks or the cloud can be monitored through Longitude's “Remote Statistics Servers” which collects performance data in the remote network and transmit it back to the Longitude console.

  • The collected data is then written to a central repository

  • Longitude evaluates the data using its built-in knowledge base when it is written to the  central repository

    • Alerts are generated and are viewable from a customizable dashboard

    • Alerts can also be generated via email, SMS, and 3rd party integrations

    • Scripts can be executed in response to alerts

 Longitude Enterprise Edition – Dashboard of Events

Figure 3.  Longitude Enterprise Edition – Dashboard of Events

Here we see a Longitude Dashboard display with a customized view showing current server problems. Longitude has identified a process that is consuming excessive CPU and can be configured through the Web UI to send alerts to the specific IT staff responsible for correcting the problem.   The ability to tailor notifications, including escalating notifications for problems that are growing worse, and sending notifications to mobile devices, is an important part of any automated server monitoring strategy as it is critical to get the right information to the most appropriate people - be they developers, end users, or IT staff.  

Conclusion:

Automated Server monitoring need not be complicated or time consuming. The key to keeping it simple is to minimize the intervention required by IT to manage the server monitoring technology.  This means the technology must be able to constantly discover and account for server configuration changes as well as any changes to the server operating system and associated applications.

Server monitoring's most intrinsic value is being pro-active so take special care in organizing how you configure alerts.  Alerting should have built-in escalation and handle IT personnel changes with minimal intervention.

Lastly, do not “black box” server monitoring and alerting!  Make sure multiple IT staff are trained on how to configure, maintain, and extend your server monitoring and alerting technology.  There are many cases of failed deployments because the one individual who understood how to set up the technology moved on.  Given the dynamic nature of information technology it is important that there is always a core competency available to handle any required adjustments.

 

Want to learn more?

Download a FREE trial of Longitude  - Stand up Longitude in just minutes and immediately start seeing how your environment is performing, receive proactive alerts, reveal root causes, automate corrective actions, and more...

 Start Your Free 14 Day Trial of  Longitude Today!
 

 

Sign Up for the Blog

Heroix will never sell or redistribute your email address.