Rightsizing virtual machines (VMs) for both private and public cloud configurations is an absolute must, especially if IT is tasked with delivering a cost-effective public cloud infrastructure that can service the needs of the organization.
Although capacity planning has always been a quiver in IT’s bow, the speed/capacity of the IT infrastructure relative to the cost frankly made it easier for IT to simply err with additional hardware rather than engage in a serious capacity planning effort. However, the advent of cloud computing has necessitated a change in mindset.
Private Cloud Concerns:
The natural progression from a physical to virtual infrastructures meant that organizations could readily over-provision the VMs on their hosts. In fact, vendors encouraged over-provisioning, as the primary principle behind virtualization is to make the most effective use of shared resources.
The path of least resistance for IT was to over-provision the VMs - as this avoided IT being accused of causing performance problems, also IT would less likely need to resize VMs later on.
Although the VM sizing requirements demanded by users/vendors might have been met with a healthy bit of skepticism by IT, it was frankly easier for IT to give them what they wanted – even though what they asked for may have not been what they needed.
A private cloud capacity shortage was solved by adding more hosts and other physical resources, which meant garnering support for additional CapEx. Public cloud has caused a paradigm shift as IT organizations are now under pressure to minimize CapEx in favor of OpEx.
It is critical to get your private cloud in order before migrating to public cloud - because the problems associated with private cloud over-provisioning do not disappear with a lift and shift to public cloud, instead they are exacerbated.
Public Cloud Concerns:
The evolution from private cloud to public cloud has caused a change in how IT plans for and regulates IT infrastructure costs. One of the reasons for the shift, is that configuration mistakes can be much more expensive and more visible in public cloud versus private cloud deployments.
Because private cloud costs are allocated mostly as CapEx - the effect to the bottom line is fixed no matter the size or number of VMs. Public cloud takes it to a whole new level because the cost structure is far more tangible, especially with the visibility of cloud billing. Additionally OpEx itself tends to be a bit more visible.
Public cloud costs vary according to the VM size and instance (i.e. On-Demand, Reserved Instance, and Spot/Preemptible VM Instance) - trying to arrive at an appropriate configuration with an acceptable cost structure is a challenging process to say the least.
Rightsizing the VM’s and determining the most appropriate instance is critical to any successful public cloud deployment. A wrong decision can be costly – i.e. when IT makes a 1 or 3-year commitment for a set of Reserved Instances, they are committing to paying for those RIs whether they use them or not.
The equation takes on additional complexity when working with elastic demands. IT needs to take special care to regulate when and how virtual machines are created in order to meet real-time demand - and at the same time keep costs in check.
Proper capacity planning and rightsizing the VMs is critical to not only keeping cloud costs in check, but also adding cost predictability to the equation.
Rightsizing VMs and Private Cloud
Proper capacity planning means rightsizing the private cloud VM’s so that you can optimize the VM density per host, maximize the CPU usage, and provide optimal application performance. Ideally the goal is to keep licensing costs in check and make the best use of private cloud resources.
In order to accomplish this, you have to understand the resource utilization; both the aggregate usage on the hosts as well as the footprint of each of the VM’s across hosts. Ultimately it is about understanding how much capacity you have and how it is being used.
Longitude Capacity Planner showing VM usage and capacity on a Host
Rightsizing engenders not only the sizing of the VM’s but also optimally distributing them between hosts. This means understanding how over-provisioning affects the virtual infrastructure. For example:
Symptoms of VMware over-provisioning:
We’ve given a lot of attention to understanding the workload at the hypervisor level, but we can’t ignore what is happening within the individual VMs’ themselves. For example, when a VM is showing high CPU values it is important to understand when and why? Is there a pattern? What is the application doing? For example, if a given workload is for nightly processing how quickly does the application need to complete before the next step?
Again, we’re getting back to what kind of resources does the application need versus what kind of resources do users/vendors want. The guiding principle should be to reallocate over-committed resources towards the VMs that can better utilize them.
Rightsizing VMs for Public Cloud
Public cloud is fundamentally different from private cloud. Proper capacity planning means not only rightsizing the VM, but also selecting the appropriate instance. Cloud providers deliver a myriad of options, so making a selection can be a somewhat intimidating process.
The size and instance of the VMs governs Public Cloud cost.
- Size - the bigger the VM, the more you pay
- Instance - (i.e. On-Demand, Reserved, and Spot/Preemptible Instance)
Mistakes can be quite visible. For example - if leveraging Reserved Instances, you’ll pay whether you use all or none of the resources.
Longitude showing underutilized VMs
Bigger VMs do not necessarily translate to better
Example 1: CPU Usage Windows
Let’s take the example of a VM showing a pattern of high CPU utilization that consistently lasts a few hours, as in the report below.
If it is a batch job and the 4 hours is OK , especially as it is in the middle of the night, it would be wasteful to opt for a larger VM. In addition, the use of an On-Demand instance might be a more appropriate choice over a Reserved Instance. Also if the batch job can be preempted we may well want to consider Spot/Preemptible Instances as well.
|Longitude showing average hourly CPU of an EC2 instance over 1 month|
Example 2: Memory Resources Linux
Rightsizing Linux VMs can be a bit challenging especially when it comes to memory considerations. Unlike the Windows operating system, the Linux kernel uses free memory for caching and buffering to speed up operations, so looking at the amount of free memory is not especially helpful.
Using data from the free command is invaluable. Here we see the output from a small test Linux server, displaying memory metrics in megabytes.
|Mem: buff/cache||memory for kernel buffers and page cache||1508|
|Mem:used||total memory in use is calculated
Mem:total - Mem:free - Mem:buff/cache
|Mem:available||memory available to launch new processes without initiating swapping||1086|
|Swap:used||amount of swap memory in use||2047|
Although the unused memory is showing as 110, what we're really interested in is how much memory the processes are using on the this test VM.
|Formula to calculate process memory usage on a Linux VM|
|Mem:total - Mem:Available|
|1839 - 1086 = 753|
The total memory used by our processes on this test Linux instance is only 753.
So let’s extend this to a Linux VM configured with 64 Gig of memory.
If the Mem:available is hovering around 40 Gig then using the formula above
64 Gig - 40Gig = 24 Gig
24 Gig is the total memory used by processes. We could then safely deploy a public cloud instance of this Linux VM with 32 Gig rather than 64 Gig.
Conversely if Mem:available is low and Swap:used is high then your Linux VM is running memory constrained and you’ll want to configure with more memory.
Because private cloud cost structures are fixed, they can mask the cost of VMs that are given more resources than they need – making the rightsizing of private cloud environments an absolute must before pursuing public cloud deployments .
Equally important to sizing public cloud VMs is determining the ideal VM instance, which means understanding what application resources are required and when.
As the momentum behind cloud computing increases and pressure mounts from management to move forward with its adoption, anything that IT can do to calm fears of runaway costs and ease the transition to cloud will enhance IT’s stature and value.