The first thing you may wonder is what exactly is the Hybrid Cloud?
The Hybrid Cloud is when an organization runs a combination of on-premises IT infrastructure in conjunction with one or more public cloud services (i.e. Amazon AWS, Microsoft Azure, or Google Cloud Platform).
Hybrid Cloud is a desirable because it allows IT to:
Maintain tight control over on-premises IT infrastructure and applications
Leverage additional stability, resiliency and capacity associated with the public cloud
Take advantage of the efficiency and scaling of elastic public cloud resources
Shift cost structure from CAPEX to OPEX
Gartner predicts a “decline of no-cloud policies and a corresponding growth of hybrid cloud”. There are certainly a number of reasons for hybrid cloud adoption over public cloud only, among them is the reluctance to relinquish control of the data center. On-premises datacenters are often about physically controlling critical IT resources as well as securing data. However, given the inherent efficiency, resiliency, and cost savings benefits of public cloud, organizations cannot ignore the benefits of a hybrid of on-premises and public cloud.
Operating a hybrid infrastructure is not without its challenges as IT is tasked with monitoring disparate environments. Many cloud providers deliver tools for monitoring their cloud services. The tools, in general, come with dashboarding, alerting, and reporting capabilities for only their services. If a public cloud provider’s forte is, for example, delivering Infrastructure as a Service (IaaS) they may not know what is happening inside the virtual machines. The provider’s monitoring tool may understand the inner workings of cloud resources but be blind to key performance and availability metrics for the operating systems (i.e. Windows or Linux) and the applications (i.e. Oracle, Microsoft SQL or IIS) hosted by the cloud.
If IT is to quickly understand where their problems lie, be it network, server, or application - it is an absolute necessity to consolidate the alerting and reporting of all performance and availability issues across multiple technology stacks. This means leveraging monitoring technology that can reach across disparate networks and unite all the data for analysis, alerting, and reporting. Cross-cloud monitoring technology enables IT to quickly diagnose problems, while simultaneously minimizing the IT resources required to configure the monitoring technology itself (i.e. thresholds, dashboards, alerts, corrective actions, and reports).
Single Pane of Glass
Longitude is invaluable for those pursuing a “single pane of glass” from which to view the health and performance of hybrid cloud environments. The Longitude console can be installed on-premises or in the cloud and ships with a built-in knowledge base that understands operating systems, networks, and applications. The technology is agentless, meaning there is no client to install as Longitude relies on a variety of API’s and protocols to “pull” key performance and availability metrics from the IT infrastructure and the accompanying applications. Agentless data collection is ideal for the hybrid cloud because there is no technology to deploy or maintain on virtual machines, monitoring overhead is negligible, and deployment is quick and painless.
To unify the monitoring of hybrid environments, users install the Longitude “Remote Statistics Server” client on a single server in each cloud environments. As the name implies a “Remote” Statistics Server is able to agentlessly collect key metrics/statistics at the “remote” location. The collected data is then processed by the Longitude Console and used for analysis, alerting, reporting, and more.
The Remote Statistics Server is ideal for a hybrid cloud infrastructure. Once it is configured from the Longitude Console, it is off and running as it knows what data to collect and when. The Remote Statistics Server is resilient as it continues to function even if connectivity to the Longitude Console is compromised, and all data collected by the Remote Statistics Server is guaranteed for delivery to the Longitude Console even in the event of network failure.
Figure 1. Longitude dashboard showing Hybrid Cloud data
Here we see one of the many ways that Longitude can present a “single pane of glass” showing the current health and performance of the Hybrid IT infrastructure including applications - both operating on-premises and in the cloud.
Each of the widgets in the dashboard is the result of Longitude evaluating performance or availability using its built-in knowledge base to detect problems. Dashboards can be easily extended to represent all or a subset of the hybrid environment. IT can visualize problems based on any hierarchy including displays based on criteria such as application, function, geography, or any other user definable grouping. In addition, users can further drill down on the widgets for additional detail. Dashboards can be made available to users with read only access to the Longitude Web Portal, adding a level of transparency for IT and its constituents.
Service Level Agreements (SLA)
Dashboards are invaluable in helping IT quickly identify current problems - but what about obtaining historical context? What happens if the problem is transient? What about correlating performance and availability metrics across the entire technology stack and identifying compliance issues?
This where Service Level Agreements (SLAs) add the most value:
SLAs keep the cloud providers (and IT for that matter) honest. Providers can tout all manner of availability and performance metrics; however, we need to consider who is watching the watcher.
SLAs provide historical context as to what problems happened and when. When there are performance or availability issues is there a pattern? Does non-compliance occur during specific times of the day or days of the week?
SLAs can correlate performance and availability metrics across multiple technology stacks and show the impact to end users.
Longitude SLAs allows user to define Service Level Agreements containing any combination of “service conditions”. A service condition characterizes one or more performance or availability metric from anywhere (on-premises or cloud). For each service condition, compliance behavior is defined in the form of good, degraded, or unacceptable behavior. For example; CPU>90% Unacceptable, CPU between 80%-89% Degraded, and CPU <80% Good.
Figure 2. Longitude Summary SLA Report
This excerpt from a Longitude Summary SLA report is correlating critical metrics across a time period where there is degradation (in yellow) of performance for an ERP application. We can see correlations between web portal and database response times and a CPU spike related to Cloud based virtual machines. All the other targeted resources are within their acceptable baselines.
The next step to resolve this problem is to either run a Longitude Detail SLA report, to drill down into SLA metrics in more granular detail, or to launch Longitude's interactive SLA dashboard. In addition to the SLA itself, Longitude’s alerts and reports will also yield more information regarding the nature of the problem.
A guiding principle in Longitude is to make sure alerts occur well in advance of compliance issues, so that IT can solve or mitigate problems before users are affected.
In general, cloud providers are great about providing tools that monitor their respective cloud environments. The real challenge in operating a hybrid infrastructure that tightly integrates on-premises applications with cloud based resources is finding a technology that can bridge across all the layers. Longitude can tie together metrics from multiple technology stacks and provide a single pane of glass for the hybrid cloud.