As cloud adoption continues to ramp up, IT needs to not only give careful consideration to the core capabilities provided by the cloud providers, but also to the technology that natively monitors the health and performance of the cloud infrastructure.
It is important for IT to recognize both the coverage and the limitations of cloud monitoring technology. IT requirements vary depending on the type of cloud deployment, i.e. hybrid, public, or multiple providers, and understanding the available monitoring toolsets will help organizations better determine the resources they will need to commit to effectually monitor the health and performance of the IT infrastructure and applications.
Amazon CloudWatch is the platform that monitors Amazon Web Services (AWS). Fundamentally CloudWatch provides:
Ready Access and visibility to performance and health metrics
Notifications and alarms
Log data monitoring
Integration with other AWS services and other monitoring products
Everything operating natively in the AWS infrastructure generates CloudWatch Metrics, for example:
- Infrastructure Metrics
- Virtual instances in Amazon EC2 (Elastic Compute Cloud)
- Data stored in Amazon S3 (Simple Storage Service)
- Elastic Load Balancers
- AWS Application Metrics
- Amazon RDS
Users select the metrics they need to monitor (e.g. EC2 instance CPU utilization) and create graph widgets to display them. These widgets are then added to a CloudWatch dashboard which provides tremendous flexibility in terms of data visualization - for example zooming in, or re-scaling the graph.
Overall, CloudWatch provides an excellent framework from which to monitor your AWS environment, with easy access to key performance metrics and the log data. However, be prepared to invest time to build the whole thing out, and to do so without a default knowledge base. Another option to better leverage the CloudWatch data is to use AWS Lambda to send CloudWatch data to an AWS Partner, such as Splunk, for further analysis and processing.
Wall Street analysts acknowledge that hybrid cloud environments will remain a critical component of the IT landscape, as evidenced by Amazon continuing to forge a closer relationship with VMWare. However, CloudWatch is designed for AWS services. If an organization is operating a hybrid of AWS and VMWare/vSphere or Microsoft/Hyper-V, or a hybrid with other cloud providers, then they’ll need to look beyond Cloudwatch for a single pane of glass view of health and performance.
Azure Monitor is Microsoft’s built-in monitoring service for the performance and health of Azure resources. At its most basic level, the model is similar to Cloudwatch: Azure Monitor consumes the telemetry data (performance and log data) that all Azure services generate and allows the user to visualize, query, route, archive, and take actions on the data.
Azure Monitor has the following features:
All metrics and log records are alertable with email and Webhook
Dashboards can be readily created using telemetry data that display performance, availability, and compliance
Logs can be evaluated or sent to Azure Storage or Event Hubs
Azure Monitor alerts can use webhooks as well as email. Webhooks easily integrate with 3rd party applications like PagerDuty or Slack and can also be used to execute scripts. Azure Monitor's webhook implementation is well integrated in Azure Monitor in contrast to Cloudwatch and SNS which requires the user to send the alert to an SNS topic followed by SNS sending the alert to an HTTP endpoint.
Azure Application Insights provides application performance analysis for Azure Monitor telemetry data by using built-in analytics to aggregate and correlate metrics and using machine learning to deliver availability, performance and usage pattern details for applications.
Azure Monitor's Log Analytics service can correlate across logs to provide a holistic view of your environment. Log Analytics is part of Operations Management Suite (OMS), a suite of cloud based services that are strategic part of monitoring the Azure technology stack.
As Microsoft’s technology stack continues to expand (i.e. Windows/Azure hybrid cloud, Windows Azure Pack, and Windows Azure Stack), selecting a monitoring and management technology becomes more complex. Systems Center, Azure Monitor, and OMS share a subset of monitoring capabilities - for example, there is a System Center Management Pack for Azure - but they diverge as well. Azure Monitor is focused on the Cloud, so If you need a preconfigured knowledge base for Windows and the full suite of Microsoft Applications, then you will need to look beyond Azure Monitor.
Azure Monitor is obviously Microsoft-centric. However, that may not be an issue for organizations that have made a strategic decision to go all in with Microsoft. One stop shopping and support for any combination of private, hybrid, or public clouds is a compelling story.
Stackdriver, Google’s offering for delivering cloud monitoring capabilities, differs from both Cloudwatch and Azure monitor in a number of ways. Firstly, Stackdriver embraces not only Google Cloud Platform (GCP) but also AWS, providing unified monitoring of the two cloud platforms. Google touts Stackdriver’s multi-cloud strategy and, given Amazon’s prominent standing, it certainly broadens Stackdriver's appeal.
Secondly, Stackdriver includes a development (DevOPs) component in addition to IT monitoring. However, while the IT Operations functionality spans both AWS and GCP, the DevOPs functionality is Google-centric. Stackdriver is able to troubleshoot deployments on the Google platform with tracing and debugging functionality, and offers capabilities such as:
Monitoring - dashboards and alerts for performance and availability
Logging - visualize log data with searches and filtering
Trace - show latency data from applications deployed on App Engine
Error Reporting – smart error visualization and notification
For monitoring, Stackdriver gathers GCP,AWS, and custom metrics using the Stackdriver Monitoring API. If monitoring needs expand beyond native cloud services (i.e. 3rd party applications and virtual machine metrics) the API allows you to extend your monitoring capabilities.
Stackdriver requires installation of an open source based collectd agent to access non-cloud metrics. The agent performs a discovery, looking at active ports and instance names to determine supported third-party applications (e.g. Apache or MySQL) to monitor. Once an application is monitored, key metrics become available on the console and users can access default dashboards or define their own dashboards. Stackdriver's charting and graphing functions are strong points, with easy creation, correlation and drill down. Alerts are easy to configure and can be sent to the console, Slack, Hipchat, Pagerduty, SMS, Email, Webhook, and Campfire.
Stackdriver uses the fluentd agent to ingest system and application logs and log based metrics in GCP and AWS, and provides a Logging API. Log data from Stackdriver can also be exported to other Google services such as BigQuery, where users can further analyze and correlate log stream data with more powerful SQL queries.
Stackdriver's DevOps capabilities differentiate it from other cloud monitors. After installing an agent for the supported language, it can work with applications with source code stored in the Google Cloud Source Repository, GitHub, or BitBucket. Capabilities include:
Debugging live applications running on App Engine and Compute Engine
Capture and inspect the call stack and local variables in an application
Stackdriver can track latency in App Engine applications. Developers can use the information displayed in near real time to analyze application latency and quickly isolate the causes of poor performance.
Stackdriver relies on the fluentd agent used in Logging to aggregate error and crash information for applications deployed on App Engine and Computer Engine, and this error data can be used to identify and track application problems. When an error is identified (e.g. in an automated email), the Stackdriver console can display how many times the error has occurred, its first and last occurrences, the stack trace, and more.
Ultimately, if you are monitoring the overall health and performance of a combined AWS and GCP environment, or if you’re a developer using Google services, then Stackdriver is an appealing option. Although Stackdriver is easy to navigate, you will still need to invest time in configuring your environment, just as you do with Cloudwatch and Azure Monitor. Stackdriver's reliance on agents may be problematic for some sites, while deploying and managing an agent based site may not be an issue for others.
The technology to monitor a cloud platform is one of the many factors to consider when you select a cloud vendor. All cloud vendors provide tools that monitor their own platforms well, but it is up to IT staff to build those tools into a site specific monitoring package. “Building” is seldom a quick process, so it is important to have realistic expectations regarding the resources required to properly implement cloud monitoring.
Understanding what is happening within virtual machines, from both operating system and application perspectives, is an important part of managing a cloud infrastructure. Technologies like Longitude complement cloud monitoring with built-in application and operating system knowledge bases, and fast, easy deployment, providing an key part of a complete cloud monitoring package.