Blog

Lightweight Monitoring

May 01, 2018 | Ken Leoni

Lightweight IT monitoring – what does it really mean and why is it so important? Ultimately, the holy grail for most IT operations professionals is the implementation of a self-monitoring and self-healing IT infrastructure - freeing up their time so that they can advance high value initiatives rather than chase down avoidable problems.

time and effort to deploy and maintain an IT infrastructureThe reality is that there is always some cost associated with IT infrastructure monitoring. One way or the other someone, whether it is onsite personnel or an outsourced service, is going to have to invest time and effort to deploy and maintain an IT infrastructure and application monitoring solution. The challenge IT faces is keeping costs under control, while at the same time delivering a reliable and resilient IT Infrastructure Monitoring strategy.

 

What does it mean to be a Lightweight Monitoring Solution?

A central tenet of a lightweight IT monitoring solution is to require a minimal amount of time and effort from IT personnel to install, deploy, manage, and maintain the monitoring technology.

This means no steep learning curve and no need for classes and certifications in order to attain mastery. The more complex and heavy an IT monitoring technology, the more likely an ineffectual deployment or worse yet a situation where the technology is not deployed at all - languishing as expensive shelfware.


A lightweight IT monitoring solution must use little or no system and network overhead. 

An agentless approach that makes use of existing protocols to pull key performance and operational indicators from the IT infrastructure and applications keeps the IT monitoring footprint small and greatly reduces concerns over monitoring overhead.

 

 

Lightweight Monitoring - Keep it simple and quick. 


Let’s take the example of a hybrid cloud infrastructure that is dispersed across multiple locations and running across disparate IT infrastructures. What does IT require in order to quickly monitor the IT infrastructure? How do we keep the overhead exerted on IT to a minimum?

How to keep IT Monitoring overhead to a minimum

Automate discovery – Lookup and find the servers and/or applications


Understand what needs to be monitored - Once a server is discovered, automatically determine what components need to be monitored


Apply a built-in knowledge base - Understand the components and apply appropriate performance and availability thresholds for alerting


 

Automate Discovery

Ideally IT organization should automate discovery - so that from a central location they can readily monitor and manage the IT infrastructure across multiple environments - local or cloud based. In addition, the discovery and monitoring process should be “atomic”, meaning monitoring starts as soon as servers and applications are discovered.

The discovery process itself should be agentless and generate little or no system overhead. In an environment with multiple remote locations it is a best practice to use a proxy at each location for discovery. Local discovery both minimizes network/system overhead and makes the discovery process itself more resilient to network interruptions.

 Automatic Server Discovery in Hybrid Cloud

Figure 1. Configuring Server Discovery in Longitude Enterprise Edition

Above we are configuring the Longitude Console to perform a Windows server discovery inside of an AWS EC2 environment. In this configuration a lightweight process called the “Remote Statistics Server” is deployed on a single EC2 instance, where it is then tasked with performing a discovery inside the environment.


Hand Drawn Keep It Simple Concept  on Chalkboard. Blurred Background. Toned Image.There may well be situations where IT wants greater control and granularity over IT infrastructure and application discovery. For example, a site may have unique criteria as to when monitoring needs to be initiated, suspended, or completely stopped - in which case scripting can play a critical role.

Using scripts to drive the discovery allows IT to readily embed lightweight monitoring into their existing automation process and decrease the likelihood of an infrastructure component or application being missed from monitoring. 

The Longitude command line interface provides for readily customized server, network, and application discovery.

 Monitor all the Windows Servers in a Windows Domain
## searchbase is the domain specification
## dcbase is a domain controller for that domain
## if -searchbase & -server are left out of the query,
## the query will default to the current domain

$searchbase = "DC=test,DC=net"
$dcbase = "DC01.test.net"

$serverlist = (Get-ADComputer -Filter {operatingsystem -like '*Server*'} `
   -SearchBase $searchbase -Server $dcbase | select name | sort-object name)

foreach ($server in $serverlist)
    {
    c:\LongitudeCommand\longitude monitor /c $server.name /a windows
     }

 

This simple PowerShell Script scans an Active Directory domain controller for computers with "Server" as part of their Operating System name. All computers in this list are then configured to be monitored using Longitude's built-in knowledge base for Windows. We could easily extend the script to include/exclude servers from monitoring based on any number of criteria.

The discovery process above is based on Active Directory, this isn't to say that another criteria could be used (i.e. initiate monitoring when launching an EC2 instance using both Longitude and AWS CLI).


The volume of data generated that shows server performance and availability is quite substantial. Ultimately it is about how to make the best use of that data.

Download the whitepaper: Best Practices for Server Monitoring

This whitepaper helps IT achieve successful long-term server monitoring by focusing on an approach that is lightweight, efficient, resilient, and automated.

 

 

Understand what needs to be monitored

Keeping monitoring quick and clean not only encompasses discovering which servers need to be monitored, but it also involves proactively determining what needs to be monitored on the servers themselves. A fundamental goal of any automation initiative should be to eliminate the need for IT to manually determine what needs to be monitored.

Let's explore the process for the monitoring of the Windows operating system on a set of servers, keep in mind these concepts have applicability when monitoring the applications on a server as well. 

Running a manual inventory and using that to determine what needs to be monitored is a flawed strategy:

  1. In environments where the server counts are constantly changing IT simply doesn’t have the time to track each server. Cloud computing makes this particularly challenging especially because of the autoscaling features built into technologies like Amazon Web Services, Azure, and Google

  2. Servers' attributes vary depending on their role in the organization, rendering a one size fits all monitoring template unworkable. For example:

    • A web server’s hardware configuration is very different from that of a database or application server. The number of NICs, disks, CPUs, etc., will vary and virtualized servers can be dynamically altered later to accommodate changes in workload. 

    • Proactively discovering critical server specific components is also important as components will vary across the life of a server. For example:

      • The number and names of Windows services will vary depending on the servers' role, and services can be installed or removed as new software technologies are updated or newly deployed.

      • Filesystems on Unix servers can be mounted or dismounted as needed, and newly mounted filesystems should be monitored to ensure availability.

Minimizing the time required of IT means the technology must constantly look for any modifications to the servers' hardware or operating systems and adapting to those changes without the need for user intervention.

 Longitude Enterprise Edition automatic server monitoring

Figure 2. Longitude Enterprise Edition - Automatic Server Monitoring

Above we see Longitude Enterprise Edition configured to monitor the Windows OS on a number of servers 

  • The Collection column on the left represents the categories of operating system components that Longitude automatically discovers and monitors.

  • The Schedule column to the right indicates how often Longitude will inventory the server for changes and collect the key performance and availability data.

Note: again, these concepts also apply to monitoring other IT infrastructure components and applications

 

Apply a built-in knowledge base

Whether an environment has dozens of servers or thousands, leveraging a built-in knowledge base enables IT operations to be more effective and efficient. Manually determining what constitutes a problem on a server by server basis would take an enormous amount of time and effort and in all likelihood miss some issues.

We want to minimize the time and effort required of IT to determine where and when problems have occurred.

Longitude enables IT to be proactive by automating IT infrastructure and application monitoring and alerting IT staff with an agentless, lightweight, and efficient architecture. 

 

Longitude Work Flow for IT Infrastructure and Application Monitoring

  • Agentless discovery determines what devices and applications need to be monitored

  • Once discovered, agentlessly collect key performance and availability data from the servers, network devices, and applications using appropriate protocols and API's. (i.e. WMI, PowerShell, SSH, vSphere API, JDBC, SNMP, and more.)

    An agentless approach works well for collecting data at remote locations (i.e. cloud deployments) via a remotely deployed collection proxy. The proxy collects all the data within a location and rolls up the key metrics for centralized alerting, dashboarding, and reporting. 

  • The collected data is then written to a central repository

  • The data in the central repository is regularly evaluated using the built-in knowledge base to:

    • Generate alerts for viewing in the customizable dashboard

    • Send alerts via email, SMS, and 3rd party integrations

    • Execute scripts in response to alerts


 Longitude Enterprise Edition – Dashboard of Events

Figure 3. Longitude Enterprise Edition – Dashboard of Events

Here we see one of many customizable Longitude Dashboard displays showing open problems specific to Windows Servers. In this example Longitude discovered the server and knew exactly what metrics to collect, when to collect them, and how evaluate them. Longitude is also smart enough to only collect process information when the server is in a resource depleted state, further minimizing overhead on the already saturated server. 

 

Conclusions:

IT Infrastructure and Application Monitoring need not be complicated or time consuming.

  1. A key to keeping it simple is to minimize the intervention required by IT to manage the monitoring technology. This means the technology must be able to constantly discover and account for configuration changes. In addition the technology must automatically know what constitutes a problem and communicate that to appropriate personnel. 

  2. An agentless IT monitoring approach delivers a number of advantages to IT Operations

    They need not be concerned with deploying agents upgrading agents, agents crashing, or agents fighting with applications. Nothing gets deployed on the monitored servers and network devices.

    By leveraging protocols that natively pull network, server, and application metrics IT is able to keep the IT monitoring lightweight and minimize the need for change control procedures for the deployment


  3. IT monitoring deployments can falter when the core competency is limited (due to a steep learning curve, lack of interest, lack of time) and then later moves on. Given the dynamic nature of information technology it is important that there is always a core competency available to handle any required adjustments.

    Do not “black box” your IT monitoring and alerting! Make sure multiple IT staff know how to configure, maintain, and extend your IT monitoring and alerting technology. This is certainly a more straightforward process when the underlying monitoring technology is simple to understand and easy to use.



Want to learn more?

Download a FREE trial of Longitude - Stand up Longitude in just minutes and immediately start seeing how your environment is performing, receive proactive alerts, reveal root causes, automate corrective actions, and more...

 Start Your Free 14 Day Trial of  Longitude Today!
 

 

We value your privacy and will not display or share your email address

Sign Up for the Blog

Heroix will never sell or redistribute your email address.