Blog

What is Proper IT Infrastructure Monitoring

October 15, 2019 | Ken Leoni

The IT Infrastructure is almost always the first place for blame (fair or unfair) when there are application performance and availability issues. This is why IT infrastructure monitoring is often a topic of conversation between C-Management and IT - regardless of whether the infrastructure is an on-premises, hybrid cloud, or even public cloud deployment.

When the IT infrastructure is compromised IT operation's feet are invariably held to the fire.

Question arise:

  • How could this happen?

  • What could we have done better or differently to avoid the problem?

  • Can you quantify what has happened in the past and correlate to the future so this problem doesn’t happen again?

 

What is Proper IT Infrastructure MonitoringWhat is the best way to mitigate IT infrastructure problems?

Ultimately you provide the most benefits to the organization by implementing a monitoring strategy that understands the IT infrastructure and knows what to do with critical information once a problem is identified (i.e. who to notify, how to escalate, and what steps to take to mitigate).


What to monitor?

Before delving into IT infrastructure monitoring it is important to define what comprises the IT infrastructure. The knee jerk reaction is to encompass only the physical elements of the datacenter, however it is important to realize there are quite a few other variables that enter into the equation.

IT Infrastructure Monitoring is the centralized monitoring of the performance and availability of key hardware and software resources required to provision applications and includes:

  • Servers

  • Private, Hybrid, and Public Cloud Resources

  • Network

  • Operational software (i.e. Active Directory, DHCP, Hardware health, etc.)


How to Monitor

The whole idea behind effective IT infrastructure monitoring is to proactively detect problem symptoms before users (and by extension management) are affected. This is perhaps an overly simplistic description as there are quite a few moving pieces that have to operate in concert to pull this off. The devil really is in the details.

    • Collection and Evaluation - What type of IT Infrastructure data is collected/evaluated and how often?

    • Alerting and Automation- How are IT and their constituents notified when performance and availability are compromised?

    • Dashboarding - How is the health of the IT Infrastructure visualized?

    • Reporting – How are performance and availability metrics quantified over time?

Download a FREE trial of Longitude
Stand up Longitude in just minutes (no agents) and immediately start seeing how your environment is performing, receive proactive alerts, reveal root causes, automate corrective actions, and more.

 


Collection and Evaluation

At its most basic level effective IT Infrastructure monitoring is about having a proper built-in knowledge base in place that understands what key performance indicators (KPIs) to collect, how often to collect them, and most importantly how to evaluate them. This means automating the consistent collection and analysis of metrics (again both hardware and software components) at a granularity that cannot be achieved manually.

  • The built-in knowledge base must adapt (automated discovery) to changes in the IT infrastructure.

  • A guiding principle is to minimize any required interaction by staff with the IT monitoring software in order to account for changes in IT infrastructure.

    For example, automatically detecting changes in hardware (i.e. CPU, Memory, or Storage resources) and software (i.e. changes in Active Directory or DHCP).

IT infrastructures are inherently volatile, so it is important to eliminate the possibility of critical resources being overlooked in monitoring.

Automated discovery works best when implemented as an agentless architecture. Leveraging protocols like WMI, SSH, and SNMP allow for the almost instantaneous monitoring of newly discovered IT Infrastructure components. Operating agentlessly also minimize the monitoring technology’s footprint, which is very helpful when having to deal with change control during software updates.


Alerting

Alerting on IT Infrastructure problemsA fundamental tenet of effective IT Infrastructure monitoring is communicating problem information to the right people and at the right time.

Keeping the problem information relevant and timely goes a long way towards an IT infrastructure monitoring strategy that saves time rather than takes time.

Too many alerts can be worse than none at all. A notification policy that escalates based on problem severity and persistence avoids the “sky is falling” syndrome and ensures that critical information does not get lost.

Alerts must be tailorable based on the recipient:

    • Management’s focus isn’t on problem details, but they certainly want to know when a Service Level Agreement is out of compliance. 

    • IT’s focus is centered around being proactive with alerts that prevent downtime and performance degradation.

Proper alerting requires the ability to notify staff via a number of mechanisms such email, SMS, and webhooks. Initiating corrective actions is also part and parcel to alerting. For example, if a corrective action can mitigate or even fix a problem before users are affected, then the alerting can be limited to IT staff - and broadened only when a quick resolution isn’t in the cards.


Dashboarding

Dashboards are an invaluable tool that deliver critical IT Infrastructure health information quickly and concisely to management, IT organizations, and their customers. The type of dashboards used will vary depending on the mission.

  • C-Level Officers tend to look at things operationally; dashboards that show availability in “red light or green light” form are well suited for them.

  • IT Operations care about the bits and bytes, leveraging dashboards that are clinical and deliver detailed problem diagnostics.

  • Users dependent on high visibility applications gravitate to dashboards showing service level compliance. Their main focus usually centers around the severity and persistence of any problems that affect their technology stack.

EC2 Sprawl SLA(1)

Sample Longitude SLA Dashboard drill-down looking at EC2 Sprawl


Multiple dashboard configuration options gives each tier of user information specific to their needs. Real time dashboards and displays are yet another tool to ensure that the right information is disseminated to the right people and at the right time.

 


Reporting

IT infrastructure reporting is about measuring and quantifying past health and, when possible, predicting future health. Ultimately it is about being able to provide an objective measurement of performance and availability. Proper IT infrastructure reporting comes in a number of forms.

  • Alert history - Looking at when and how often alerts are generated, and recognizing patterns helps IT more effectively prevent problems as well as provide a clearer picture of what happened and when.

  • Performance and availability - This may seem like an obvious one, but often in the heat of battle IT fails to take the time to look at reports unless or until there is a problem. When IT infrastructure monitoring and reporting is properly implemented then metrics that are out of compliance are readily observable in the reports before they cause problems for users.

  • Service Level Agreement – An SLA report aggregates availability along with key performance metrics in order to show good, degraded, or unacceptable behavior. Targeting degraded components allows IT to head off problems before there are compliance issues.

  • Capacity Planning – “What if" analyses predicts where and how potential changes in resources (i.e. CPU and memory allocation) will affect operations and licensing costs.

Longitude Capacity Planner VMware Mem

Longitude Capacity Planner showing virtual machine memory use

Reports on performance, capacity, and alerts not only help keep the entire organization informed but also help amplify the effectiveness of IT. In addition, SLA reports show the entire technology stack and can help pinpoint specific component anomalies that degrade availability and response time.

 


Conclusion

IT infrastructure monitoring need not be complicated or burdensome.  It should be simple to configure, alert proactively, deliver real-time dashboards, and provide clear reporting to IT and their constituents.

Properly implemented IT infrastructure monitoring not only allows IT organizations to head off problems earlier, but it also frees organizations to add value in other areas such as researching strategies and technologies that will allow them to increase their competitive advantage.

 

Want to learn more?

Download a FREE trial of Longitude - Stand up Longitude in just minutes and immediately start seeing how your environment is performing, receive proactive alerts, reveal root causes, automate corrective actions, and more...

 Start Your Free 14 Day Trial of  Longitude Today!
 

 

We value your privacy and will not display or share your email address

Sign Up for the Blog

Heroix will never sell or redistribute your email address.