Blog

Microsoft Exchange Server Tuning & Monitoring:  3 Keys to Better Results

April 18, 2017 | Ken Leoni

If your company has standardized on Microsoft® Exchange Server, you know how much everyone in your organization depends on its collaborative messaging services. Exchange Server configurations also have a habit of becoming large and complex as a company’s needs scale up. For the IT professionals that administer Exchange, this leads to a major responsibility that requires both skill and resources.

Here we will look at 3 key areas to focus on when optimizing the performance and availability of Exchange. The rules of thumb are provided as a starting point. 

Getting Started: MIC Metrics

Starting off with the basics of the operating system is a good place to initiate an Exchange performance-tuning exercise. Think “MIC”: memory first, followed by I/O, then CPU. Let’s start with a few of the particulars to look for with Exchange Server:

 

Memory

It probably goes without saying that memory is always going to be a consideration especially as it applies to the monitoring and tuning of Exchange and its supporting IT infrastructure.

 

  1. Exchange does not like operating on a server that is showing any stress in providing memory.  In fact, there are components within Exchange 2016 that will detect overcommitment and self tune to prevent the server from being totally overwhelmed. 
       
    1. Keep the \Memory\%Committed Bytes in Use < 80%
    2. Keep \Memory\Available Mbytes  > 5% of RAM
    3. Use fixed size pagefile of size of RAM + 10MB, maxing at 32778MB

 

  1. Exchange relies on the .NET Framework to handle memory allocation. The Garbage Collection (GC) is an automatic memory manager in .NET that is responsible for making sure Exchange processes are contained by freeing up unneeded virtual memory from them. A general rule of thumb is to always use the latest version of .NET, although proceeding with caution is a good idea  

    1. Keep .NET CLR Memory\% Time in GC < 10%
    2. Keep .NET CLR Memory\Allocated Bytes/sec <50MB

 

  1. Make sure your Domain Controllers are configured with enough memory to cache the entire Active Directory database, as AD performance and Exchange performance march in lockstep.

 

  1. When virtualizing Exchange DON’T overcommit memory. Exchange requires a known and static amount of memory that it can manage on its own. A Host that engages in any sort of memory reclamation on an Exchange VM (i.e. ballooning) will result in poor performance.

 

I/O

A common theme often echoed in the Exchange Server community is “it is all about balance”.  Exchange is tuned to scale out rather than up, you will get better performance and a more efficient use of your resources with the deployment of “more” commodity based hardware rather than “fewer” bigger pieces of hardware. 

The main consideration is how the I/O load is distributed across disks

  1. Optimize Exchange performance by keeping the number of databases balanced across the servers

A best practice is to review MSExchange Active Manager(_total)\Database Mounted – the number of databased mounted on a server. Spread the load based on the number of databases, their activity and size

  1. Re-evaluate regularly and redistribute mailboxes as needed

It is all about keeping the access time to the Exchange data, the “latency”, to a minimum. We can continue to tune the Exchange Servers but if we can’t access the data in a timely manner, then the modifying server settings is of little consequence. When initiating new instance of Exchange,  Jetstress is a must-use tool to ensure your disk subsystem is up to the task as Jetstress can help you anticipate latency issues.  Tuning your environment with these latency counters in mind below will go a long way towards maintaining a healthy Exchange implementation.

Latency counters

\MSExchange DatabaseàInstances(*)\I/O Database Reads (Attached) Average Latency

average length of time, in milliseconds (ms), per database read operation.

<20 ms

\MSExchange Database à Instances(*)\I/O Database Writes (Attached) Average Latency

average length of time,  per database write operation.

<50ms

\MSExchange Database àInstances(*)\I/O Log Writes Average Latency

average length of time, per Log write operation.

<10ms

\MSExchange Database àInstances(*)\I/O Database Reads (Recovery) Average Latency

average length of time, per passive database read operation.

<200ms

 

 

CPU

Keep in mind that balance also has applicability across Exchange Servers level as well. Exchange is architected to function most efficiently across multiple server, the advent of Office365 mandated this kind of an approach. 

The first task is to baseline CPU performance in order to obtain a good understanding of what constitutes normal resource utilization of your Exchange Server(s).  The key here is to be able to recognize changes in load characteristics.

Fortunately, CPU related issues reveal themselves fairly quickly, although the diagnosis can be a headache.

When assessing CPU the first step is to go after the low hanging fruit. You’ll want to look at the overall CPU usage Processor(_Total)\% Processor Time , ideally less than 75% is considered healthy. This counter is a percentage that is normalized across all cores. 

If you are experiencing a CPU issue the next step would be to identify the culprit process(es)  (via perfmon, taskmanager, or a 3rd party tool) and evaluate the Process\% Processor Time, keep in mind the Process\% Processor Time is calculated based on the number of cores. If graphing with perfmon, the Y axis should be resized to reflect the number of cores. For example, a process maxing out a quad-core server will return a value of 400.

When CPU issues are encountered, take a look at your baseline. Has the load on the server changed?  Is Exchange doing more “work”?  If the answer is yes it may be time to rebalance,

Processor(_Total)\% Processor Time

%  time that the processor is executing application or operating system processes.

< 75% on average.

Processor(_Total)\% User Time

% time spent in user mode. User time is the amount of time the processor spends executing application code

<75% on average.

Processor(_Total)\% Privileged Time

% time spent in privileged mode. Privileged mode is a processing mode designed for operating system components and hardware-manipulating drivers. Fro example, executing system calls such as drivers, IRPs (I/O Request Packets), context switching, etc.

<75% on average.

System\Processor Queue Length (all instances)

Processor Queue Length shows the number of threads that are delayed in the Processor Ready Queue and are waiting to be scheduled for execution.

<6 per processor

\Process (*)\% Processor Time

% time that all of process threads used the processor to execute instructions.

Evaluate based on # of cores

 

Monitoring Exchange Server Processes for CPU with perfmon

 Perfmon Graph: Application Pool CPU Utilization

Once you’ve gotten past the low hanging fruit there are some not so obvious components to review:

Make sure to set server power management to “high performance” (for physical servers), so that Exchange has a predictable amount of CPU cycles based on your initial sizing.

Processor Information(_Total)\% of Maximum Frequency

%  percent of maximum possible frequency the CPU is running at

Always 100

 

Exchange’s workload management task, which manages some of Exchanges internals such as mailbox moves and store maintenance, will manage them in the background when load is light. The task will then back off when load increases.

Many CPU problems can be averted from the get-go with proper sizing and load distribution.  Proper planning up front cannot be over emphasized, it really is a critical component to a successful deployment. The Exchange Server Role Requirements Calculator by Ross Smith IV is a very helpful tool and worth a close look.

 

Performance Reports: No Where to Hide

Starting with Exchange 2013 Managed Availability was introduced, it purpose is to make Exchange resilient to internal disruptions by introducing automated problem detection and recovery. Keep in mind, Managed Availability doesn’t determine the root cause of the problem. Its main objective is to ensure the end user isn’t effected by any underlying problems by using probes to: measure Exchange’s health from an end-user’s perspective, recognize the problem exists, and then initiate recovery actions.  Recovery can be as simple as restarting a service to even initiating a database or server failover.

Recovery is a critical component, but ideally, you’ll want to avoid and/or anticipate any problems.

In order for Managed Availability to determine there is a problem, it needs to collect and evaluate key Server and Exchange data. The collection is accomplished via the Exchange Diagnostic Service (EDS) which gathers 100’s of relevant Exchange and Server metrics and ultimately writes them into Perfmon logs to the \Microsoft\Exchange Server \V15 Logging\Diagnostics\DailyPerformanceLogs folder in the form of ExchangeDiagnosticsDailyPerformanceLog_MMDDHHMM.blg

We can actually use the perfmon data produced by Managed Availability to baseline workload, get historical context, and more... 

 Tuning Exchange with Availability Manager and Perfmon

Perfmon Report: Exchange Database Performance

Using perfmon data we can see exactly where the bottlenecks or throughput constraints are. We can analyze the activity for peak periods by running perfmon reports and graphs for:

System

  • CPU utilization and queue length. If high, then run a graph of CPU % by mode: user, system, etc.
  • Physical memory percentage used.
  • Page file percentage used.
  • Paging rates: page faults and page ins
  • Disk utilization.
  • Disk storage.
  • Network utilization.

 

Exchange

  • Exchange Domain Controller Connectivity Counters (read/search time)
  • .Net Framework Counters (garbage collection)
  • Netlogon
  • Database Counters (Read/Write Latency)
  • NET
  • RPC Client Access
  • HTTP Proxy
  • Information Store
  • Client Access Server
  • Workload Management

By evaluating system performance in conjunction with the Exchange performance data we get a clearer picture as to how the two behave together. This analysis will point to areas where throughput can be increased by either adding resources or reducing/balancing the load

If you want to build out your own reporting capability, then creating a data warehouse is a must. With a bit of scripting and Relog  you can export the daily perfmon data into SQLServer. You may need to do a bit of tuning/indexing within SQLServer afterwards, but you’ll now have all the data in one place for baselining and reporting.

 

Conclusion:

Monitoring and tuning performance of Exchange is extremely important, and it is complicated by the fact that it has to be done within the IT infrastructure that has so many interdependencies. Pay due attention to key areas, and you will be able to deal with the majority of problems that turn up.

 

Want to learn more?

Download our Best Practices for Server Monitoring Whitepaper and learn how to achieve a successful long-term server monitoring strategy by focusing on an approach that is lightweight, efficient, resilient, and automated.

 

Download the whitepaper: Best Practices for Server Monitoring