Blog

Rightsizing VMs for Cloud

April 10, 2018 | Ken Leoni

Rightsizing virtual machines (VMs) for both private and public cloud configurations is an absolute must, especially if IT is tasked with delivering a cost-effective public cloud infrastructure that can service the needs of the organization. 

Rightsizing VMs for CloudAlthough capacity planning has always been a quiver in IT’s bow, the speed/capacity of the IT infrastructure relative to the cost frankly made it easier for IT to simply err with additional hardware rather than engage in a serious capacity planning effort. However, the advent of cloud computing has necessitated a change in mindset.

 

Private Cloud Concerns:

The natural progression from a physical to virtual infrastructures meant that organizations could readily over-provision the VMs on their hosts.  In fact, vendors encouraged over-provisioning, as the primary principle behind virtualization is to make the most effective use of shared resources. 

The path of least resistance for IT was to over-provision the VMs - as this avoided IT being accused of causing performance problems,  also IT would less likely need to resize VMs later on.

Although the VM sizing requirements demanded by users/vendors might have been met with a healthy bit of skepticism by IT, it was frankly easier for IT to give them what they wanted – even though what they asked for may have not been what they needed. 

A private cloud capacity shortage was solved by adding more hosts and other physical resources, which meant garnering support for additional CapEx.  Public cloud has caused a paradigm shift as IT organizations are now under pressure to minimize CapEx in favor of OpEx. 

It is critical to get your private cloud in order before migrating to public cloud - because the problems associated with private cloud over-provisioning do not disappear with a lift and shift to public cloud, instead they are exacerbated. 

 

Public Cloud Concerns:

The evolution from private cloud to public cloud has caused a change in how IT plans for and regulates IT infrastructure costs.  One of the reasons for the shift, is that configuration mistakes can be much more expensive and more visible in public cloud versus private cloud deployments.

Because private cloud costs are allocated mostly as CapEx - the effect to the bottom line is fixed no matter the size or number of VMs.  Public cloud takes it to a whole new level because the cost structure is far more tangible, especially with the visibility of cloud billing. Additionally OpEx itself tends to be a bit more visible.

Public cloud costs vary according to the VM size and instance (i.e. On-Demand, Reserved Instance, and Spot/Preemptible VM Instance) - trying to arrive at an appropriate configuration with an acceptable cost structure is a challenging process to say the least.  

Rightsizing the VM’s and determining the most appropriate instance is critical to any successful public cloud deployment.  A wrong decision can be costly – i.e. when IT makes a 1 or 3-year commitment for a set of Reserved Instances, they are committing to paying for those RIs whether they use them or not.  

The equation takes on additional complexity when working with elastic demands.  IT needs to take special care to regulate when and how virtual machines are created in order to meet real-time demand - and at the same time keep costs in check.

Proper capacity planning and rightsizing the VMs is critical to not only keeping cloud costs in check, but also adding cost predictability to the equation.

 

Rightsizing VMs and Private Cloud

Proper capacity planning means rightsizing the private cloud VM’s so that you can optimize the VM density per host, maximize the CPU usage,  and provide optimal application performance. Ideally the goal is to keep licensing costs in check and make the best use of private cloud resources. 

In order to accomplish this, you have to understand the resource utilization; both the aggregate usage on the hosts as well as the footprint of each of the VM’s across hosts.  Ultimately it is about understanding how much capacity you have and how it is being used.

Longitude Capacity Planner Rightsizing VMs
Longitude Capacity Planner showing VM usage and capacity on a Host

 

Rightsizing engenders not only the sizing of the VM’s but also optimally distributing them between hosts. This means understanding how over-provisioning affects the virtual infrastructure.  For example:

 

Symptoms of VMware over-provisioning:

  • A host configured with VMs that have too many vCPU’s will have high CPU ready and co-stop values and low CPU Utilization. It is important to avoid over-sizing VMs with more vCPU’s than needed.

  • VMs in which the number of vCPUs is greater than the number of vCPUs in a NUMA node will result in increased CPU latency and sub-optimal co-stop values.
  • While over-committing memory can make the best use of your VMware resources keep an eye on the host’s consumed memory and the performance effects of memory reclamation on your VMs.
    .
 

 

We’ve given a lot of attention to understanding the workload at the hypervisor level, but we can’t ignore what is happening within the individual VMs’ themselves. For example, when a VM is showing high CPU values it is important to understand when and why? Is there a pattern? What is the application doing? For example, if a given workload is for nightly processing how quickly does the application need to complete before the next step?

Again, we’re getting back to what kind of resources does the application need versus what kind of resources do users/vendors want. The guiding principle should be to reallocate over-committed resources towards the VMs that can better utilize them.

 

Rightsizing VMs for Public Cloud

Public cloud is fundamentally different from private cloud. Proper capacity planning means not only rightsizing the VM, but also selecting the appropriate instance. Cloud providers deliver a myriad of options, so making a selection can be a somewhat intimidating process.

The size and instance of the VMs governs Public Cloud cost.

  • Size - the bigger the VM, the more you pay

  • Instance -  (i.e. On-Demand, Reserved, and Spot/Preemptible Instance) 


Mistakes can be quite visible.  For example - if leveraging Reserved Instances, you’ll pay whether you use all or none of the resources.

You’re looking at a 1 or 3-year commitment for Reserved Instances and depending on the cloud provider you may not be able to readily unload unneeded RIs. No matter, you’ll pay a financial penalty.

 Underutilized Public Cloud VMs
 Longitude showing underutilized VMs

 

Bigger VMs do not necessarily translate to better

Example 1: CPU Usage Windows

Let’s take the example of a VM showing a pattern of high CPU utilization that consistently lasts a few hours, as in the report below.

If it is a batch job and the 4 hours is OK , especially as it is in the middle of the night, it would be wasteful to opt for a larger VM.  In addition, the use of an On-Demand instance might be a more appropriate choice over a Reserved Instance. Also if the batch job can be preempted we may well want to consider Spot/Preemptible Instances as well.

 Longitude Report EC2
 Longitude showing average hourly CPU of an EC2 instance over 1 month

 

Example 2: Memory Resources Linux

Rightsizing Linux VMs can be a bit challenging especially when it comes to memory considerations. Unlike the Windows operating system, the Linux kernel uses free memory for caching and buffering to speed up operations, so looking at the amount of free memory is not especially helpful.

Using data from the free command is invaluable. Here we see the output from a small test Linux server, displaying memory metrics in megabytes.  

Linux free command


Metric  Definition Value
Mem:total total memory 1839
Mem:free unused memory 110
Mem: buff/cache memory for kernel buffers and page cache 1508
Mem:used total memory in use is calculated 
Mem:total  -  Mem:free - Mem:buff/cache
220
Mem:available memory available to launch new processes without initiating swapping 1086
Swap:used amount of swap memory in use 2047

Although the unused memory is showing as 110, what we're really interested in is how much memory the processes are using on the this test VM.

Formula to calculate process memory usage on a Linux VM
  Mem:total - Mem:Available
1839 - 1086 = 753  

The total memory used by our processes on this test Linux instance is only 753.

So let’s extend this to a Linux VM configured with 64 Gig of memory. 

If the Mem:available is hovering around 40 Gig then using the formula above

64 Gig - 40Gig = 24 Gig

24 Gig is the total memory used by processes. We could then safely deploy a public cloud instance of this Linux VM with 32 Gig rather than 64 Gig.

Conversely if Mem:available is low and Swap:used is high then your Linux VM is running memory constrained and you’ll want to configure with more memory.

Conclusion

Because private cloud cost structures are fixed, they can mask the cost of VMs that are given more resources than they need – making the rightsizing of private cloud environments an absolute must before pursuing  public cloud deployments .

Equally important to sizing public cloud VMs is determining the ideal VM instance, which means understanding what application resources are required and when.

As the momentum behind cloud computing increases and pressure mounts from management to move forward with its adoption, anything that IT can do to calm fears of runaway costs and ease the transition to cloud will enhance IT’s stature and value.

Want to learn more?

Download our AWS vs. Azure vs Google price matrix. The matrix compares On-Demand and Reserved Instance pricing for Standard, Compute, and High Memory machine instance classes.

In addition the matrix examines pricing for multiple vCPU configurations within an instance class.

 

AWS vs. Azure vs. Google  Download Price Matrix

Sign Up for the Blog

Heroix will never sell or redistribute your email address.