Blog

Securing and Optimizing the Virtual Network Infrastructure

June 27, 2017 | Ken Leoni

An efficient and fully functioning VMware infrastructure is dependent on shared virtual resources that include server, network, and storage - but the glue that holds the virtual infrastructure together is virtual network resources.  IT departments strive for a virtual infrastructure that is resilient to failure, secure, and performs optimally.  None of these goals can be achieved without a properly designed and maintained virtual network infrastructure.

Ensure Resilience: Configure a proper network topology 

Building resilience into a virtual network means setting up a configuration that can maintain an acceptable level of service even if there are hardware and software failures.

Ultimately resiliency is dependent on two ground rules:

  1. Building a network physical infrastructure that has no single point of failure.
  2. Configuring VMware to effectively leverage that physical infrastructure.

NIC teaming can be used to address both these guidelines.  NIC teaming is the grouping of multiple physical network interface cards (NICs) on a host into one logical NIC.  This has the advantage of providing resiliency in the event of a NIC failure, and, if the NICs are connected to different physical switches, the configuration will be resistant to a switch failing as well.  With NIC Teaming you can specify a network failure detection policy (link status or beacon probing) that will configure a VMware ESXI host to failover to another NIC.  

Best practices for critical traffic flow are to create a virtual switch (vSwitch), which consists of two or more physical NICs attached to two or more separate physical switches.  The virtual machines (VMs) and VMware hosts interact with the vSwitch, which manages network traffic over the physical NICs and switches.  If a physical device fails, the vSwitch automatically adjusts traffic to work around the failure. 

Secure: Isolate the Virtual Machine and VMkernel traffic

The VMware infrastructure produces different forms of network data based on function.

VMware network data is broken down into 2 categories: 

  1. Virtual machine traffic – this is the traffic between the VMs and the physical network. This traffic should always be isolated from VMkernel traffic, ideally on its own dedicated vSwitch or dedicated VLAN within the vSwitch.  The vSwitch can partition traffic on VLANs and control whether or not VMs can communicate with each other.
  2. VMkernel traffic - this is the standard infrastructure or “system” traffic between the VMkernel services and the physical network.  This traffic can be isolated by configuring additional VMKernel Adapters on the vSwitch that map to VMkernel services.   These services include:
    • Management traffic, aka service console traffic, is communication between the VMware hypervisor and management systems such as vCenter, as well as communication between ESXi hosts for high availability (HA).  Management traffic should be isolated in a network that only network and security administrators are able to access.
    • vMotion traffic is used for the live migration of a running VM from one host to another.   Since vMotion traffic has a high network utilization, and was not encrypted (until vSphere 6.5), it should be isolated in its own VLAN.  
    • Fault tolerance traffic – Fault Tolerance creates a secondary VM on a separate host that is a mirror of the primary VM, so that the VM will be available without the need for a reboot if one of the hosts fails.  Fault tolerance traffic is the (unencrypted) traffic that synchronizes the primary and secondary VMs.  
    • vSphere replication traffic is the result of replication of VMs to recovery sites. Isolating replication traffic is a good practice as it ensures sensitive information is routed only to the appropriate destination.
    • vSphere replication NFC traffic is the traffic for incoming replication data at the target site. Isolating NFC traffic helps security and manageability.
    • Virtual SAN – is the virtual SAN traffic on the host. This traffic should be isolated for both performance and security reasons, especially if replication is taking place.

Performance: Optimize based on workload type

There are a myriad of options available to optimize network performance in a VMware infrastructure.  Let’s see if we can break things down a bit…

We’ll use an automobile traffic analogy:

  • Drivers determine traffic lane choice
    • Commuters control which highway lanes to use based on current lane volume
    • ESXi selects the NICs that the VMs will pass their network traffic through based on predefined traffic algorithms.
  • Toll booths determine both whether drivers get on the highway and their lane.
    • Criteria include type of vehicle (emergency vehicle, car, truck, etc.) and the number and types vehicles on the highway.
    • IT administrators determine how network traffic flows based on their application's attributes and characteristics and on the supporting virtual infrastructure.

The characteristics of network traffic differ for every VMware infrastructure, so there is no one size fits all way to implement network performance optimization.  

In terms of our car analogy, if we let drivers determine their lane, they will pick the lane that will get them to their destination the fastest. At 5:00 o’clock on a Friday, with everyone heading out for the weekend, all lanes will be in use.  If a fire breaks out across town, then a fire engine needs to get across the congested lanes in the least amount of time possible. The options to allow the high priority fire engine traffic are to block other vehicles from entering the highway, or to open up a dedicated lane for emergency vehicles.  

When it is all said and done the ideal method of network optimization will be dependent on how the hosts and VMs are being used and what the priorities are for the organization.

VMware built-in policies determine network optimization:

We discussed NIC Teaming earlier a mechanism for establishing network redundancy, but NIC Teaming can also be used to implement load balancing policies that can maximize outbound traffic on the teamed NICs.  Using our earlier highway analogy, VMware optimizes traffic by determining lane choice using a fixed number of toll booths.

Load balancing configuration does not have prerequisites, and applies to outbound traffic only.  The policy settings determine the criteria used to balance traffic, with the default set to use the originating virtual port ID.  Although the default setting is usually sufficient, the other available options are:

  • Originating Virtual Port ID  
    The virtual switch selects uplinks (links from the virtual to the physical network) based on the virtual machine port IDs on the vSphere Standard Switch or vSphere Distributed vSwitch. Note:  A Distributed vSwitch provides additional capabilities, and can operate across multiple hosts within the same host cluster.
  • Source MAC Address
    The virtual switch selects an uplink for a virtual machine based on the virtual machine MAC address.
  • IP Hash 
    The virtual switch selects uplinks for virtual machines based on the source and destination IP address of each packet.
  • Physical NIC Load 
    Route Based on Originating Virtual Port, where the virtual switch checks the load on the uplinks and takes steps to reduce the load on overloaded uplinks. Available only for vSphere Distributed vSwitch.
  • Explicit Failover Order 
    No load balancing is available with this policy.

ESXi Load balancing policies (applied to outbound traffic only)

Fig 1.  Configuring Load Balancing as part of NIC Teaming

IT determines network optimization:

Going back to our highway analogy earlier, IT administrators can determine lane choice for the traffic based criteria such as vehicle type, color, make, and the number of vehicles.  IT can also set up the number of toll booths, and whether vehicles can bypass the toll booths altogether.

IT can use the following options to prioritize network traffic based on the unique characteristics and workload of their virtual infrastructure:

  1. Traffic Shaping – allows the limiting of outbound traffic on a vSwitch and within port groups based on a combination of:
    • Average bandwidth (Kbps)
    • Peak bandwidth (Kbps)
    • Burst Size(KB)
    Network traffic will flow at “Average” by default and increase to “Peak” if traffic volume mandates, and can “Burst” until the burst size value is exceeded. Note on the vSphere Standard Switch control is outbound only while a vSphere Distributed vSwitch is both outbound and inbound.

    Traffic shaping is quite granular as it provides the ability to not only control the traffic on the vSwitch but can also use port groups to segment the traffic for a group of VMs.  However, Network I/O control is a better solution to segment traffic for one specific VM.


    ESXi - Configuring a vSwitch to control outbound traffic via traffic shaping
    Fig 2. Configuring a vSwitch to control outbound traffic via traffic shaping

  2. Network I/O Control – provides the flexibility to isolate traffic flows from each other by setting reservations, shares, and limits (just like memory or CPU resource pools), helping to prevent one traffic flow from dominating the others.  Isolation is based on the following types of traffic:
    • Fault Tolerance Traffic
    • Management Traffic
    • NFS Traffic
    • Virtual SAN Traffic
    • iSCSI Traffic
    • vMotion Traffic
    • vSphere Data Protection
    • Backup Traffic
    • vSphere Replication (VR) traffic
    • Virtual Machine Traffic
    • User-defined resources
    Each traffic flow is guaranteed a share of the bandwidth, while any unused bandwidth is distributed to the other traffic.

    Network I/O Control provides more granularity than Traffic Shaping in that is allows for enforced traffic isolation and can be configured right down to a specific VM if necessary.

  3. DirectPath I/O – a VM is given direct access to the physical PCI and PCIe network devices connected to a host, bypassing the vmkernel. Because the VM is directly accessing the network device, the CPU overhead on the host is reduced as it isn’t tasked with controlling the network traffic for the VM. DirectPath I/O is useful for VMs that have high priority, high volume network traffic or if a host needs to squeeze out more CPU cycles.

    Depending on the hardware platform, features such as vMotion, Fault Tolerance, and Snapshots may be unavailable with DirectPath I/O, which may negate the benefit of direct access to a NIC.

  4. Single Root I/O Virtualization – similar to DirectPath I/O, this provides the same advantages and in general has the same limitations. SR-IOV’s main advantage over DirectPath I/O is that multiple VMs can connect physically to a compliant network device, giving more flexibility than DirectPath I/O.

  5. Receive Side Scaling - allows multiple CPU’s to process in parallel the network packets for a single NIC. Ultimately, RSS distributes the handling of VM network traffic across multiple processors. There are basic prerequisites for hardware version and the VM's guest OS, and the VMXNET3 network adapter is required. RSS can dramatically increase network throughput for VMs and is one of the more compelling reasons to default to the VMXNET3 network adapter.

    Adding VMXNET3 virtual network adapter to a VM - Receive Side Scaling

    Fig 3. Adding VMXNET3 virtual network adapter to a VM
  6. SplitRX Mode – introduced in vSphere 5.0, this provides network optimization for multicast traffic. It is enabled in by default in vSphere 5.1 and only on the VMXNET3 network adapter. Multicasting is an efficient way of sending the same packet data to multiple receivers, in this case VMs.  When multiple VMs on a single host are receiving multicast data from a given source, the hypervisor takes care of the packet replication. Because all the VMs are on the same host, the physical network need not handle the data, and the hypervisor processes a single network queue. Note that, by default, the multicast traffic has to be sent over a physical NIC.

Conclusion:

Building out a VMware environment with virtual network resources that are resilient, secure, and optimized means not only having a thorough understanding of how the VM’s are being used, the traffic they generate, and the resources available to process network traffic, but it also means asking the right questions:

  1. How much network redundancy and performance is needed?

    Network isolation, whether via dedicated vSwitches or within a VLAN, is as much about performance as it is about redundancy.  Proper planning requires a close evaluation and assessment of the physical network topology and ensuring it will work hand in hand with the virtual infrastructure.  

    At a minimum, the physical network should support redundancy for critical virtual machine traffic and VMkernel traffic supporting failover.  In addition, VMkernel traffic flows that can be substantial (i.e. vMotion) may mandate dedicated network resources.

  2. How well isolated are the traffics flows?

    It is essential to keep VM traffic on its own vSwitch or VLAN and separated from VMkernel traffic.  Traffic volume and security concerns may also require that individual VMkernel traffic flows are also separated.

  3. How well is application network usage understood?

    Optimizing the virtual network resources means matching the needs and requirements of the VMs with the resources available on the Host(s) and network:
    • Understand the network throughput generated by individual VMs
    • Identify workloads that are sensitive to network latency
    • Recognize and balance the impact applications have on hosts and the physical network
    • Select network optimization options based application requirements and infrastructure.

Want to learn more?

Download our Virtualization or Cloud IaaS Whitepaper - both technologies can provide redundancies that will maximize your uptime and that will allow you to squeeze out the most performance. Which is better and how do you decide?

Download the whitepaper:  Virtualization or Cloud IaaS?