The IT Infrastructure is almost always the first place for blame (fair or unfair) when there are application performance and availability issues. This is why IT infrastructure monitoring is often a topic of conversation between C-Management and IT - regardless of whether the infrastructure is an on-premises, hybrid cloud, or even public cloud deployment.
What is the best way to mitigate IT infrastructure problems?
Ultimately you provide the most benefits to the organization by implementing a monitoring strategy that understands the IT infrastructure and knows what to do with critical information once a problem is identified (i.e. who to notify, how to escalate, and what steps to take to mitigate).
What to monitor?
Before delving into IT infrastructure monitoring it is important to define what comprises the IT infrastructure. The knee jerk reaction is to encompass only the physical elements of the datacenter, however it is important to realize there are quite a few other variables that enter into the equation.
How to Monitor
The whole idea behind effective IT infrastructure monitoring is to proactively detect problem symptoms before users (and by extension management) are affected. This is perhaps an overly simplistic description as there are quite a few moving pieces that have to operate in concert to pull this off. The devil really is in the details.
Collection and Evaluation
At its most basic level effective IT Infrastructure monitoring is about having a proper built-in knowledge base in place that understands what key performance indicators (KPIs) to collect, how often to collect them, and most importantly how to evaluate them. This means automating the consistent collection and analysis of metrics (again both hardware and software components) at a granularity that cannot be achieved manually.
IT infrastructures are inherently volatile, so it is important to eliminate the possibility of critical resources being overlooked in monitoring.
Automated discovery works best when implemented as an agentless architecture. Leveraging protocols like WMI, SSH, and SNMP allow for the almost instantaneous monitoring of newly discovered IT Infrastructure components. Operating agentlessly also minimize the monitoring technology’s footprint, which is very helpful when having to deal with change control during software updates.
A fundamental tenet of effective IT Infrastructure monitoring is communicating problem information to the right people and at the right time.
Keeping the problem information relevant and timely goes a long way towards an IT infrastructure monitoring strategy that saves time rather than takes time.
Proper alerting requires the ability to notify staff via a number of mechanisms such email, SMS, and webhooks. Initiating corrective actions is also part and parcel to alerting. For example, if a corrective action can mitigate or even fix a problem before users are affected, then the alerting can be limited to IT staff - and broadened only when a quick resolution isn’t in the cards.
Dashboards are an invaluable tool that deliver critical IT Infrastructure health information quickly and concisely to management, IT organizations, and their customers. The type of dashboards used will vary depending on the mission.
Sample Longitude SLA Dashboard drill-down looking at EC2 Sprawl
Multiple dashboard configuration options gives each tier of user information specific to their needs. Real time dashboards and displays are yet another tool to ensure that the right information is disseminated to the right people and at the right time.
IT infrastructure reporting is about measuring and quantifying past health and, when possible, predicting future health. Ultimately it is about being able to provide an objective measurement of performance and availability. Proper IT infrastructure reporting comes in a number of forms.
Longitude Capacity Planner showing virtual machine memory useReports on performance, capacity, and alerts not only help keep the entire organization informed but also help amplify the effectiveness of IT. In addition, SLA reports show the entire technology stack and can help pinpoint specific component anomalies that degrade availability and response time.
IT infrastructure monitoring need not be complicated or burdensome. It should be simple to configure, alert proactively, deliver real-time dashboards, and provide clear reporting to IT and their constituents.
Properly implemented IT infrastructure monitoring not only allows IT organizations to head off problems earlier, but it also frees organizations to add value in other areas such as researching strategies and technologies that will allow them to increase their competitive advantage.