Run Live Demo

Windows CPU Metric Guide

October 18, 2017 |

Windows CPU usage is often focused on a headline “CPU % used” as an overall measure of CPU performance, but there are other CPU metrics that can provide a more detailed picture of the overall system state. Today’s blog will take a look at those additional metrics.

First some basic terminology: 

  • Threads

    Applications are made up of one or more processes, and processes are themselves made up of one or more threads. Threads are the basic components that can be queued up and allocated processor time.

  • Priority

    Processes and threads are both assigned priorities that determine whether they can access the processor before other threads – if you’re familiar with Linux, this is the same concept as the nice setting for a process.

    By default, priority is set to Normal, but users can adjust priorities higher or lower. Note that it is generally not advised to set processes to “Realtime” priority level.

    ProcessPriority.png

    Task Manager view of setting priority for a process

  • Logical Processors

    A physical processor on a system can have one or more cores, with each core functioning as an individual processor. Each core executes one thread at a time – however, if the thread has to wait (e.g. for I/O) then the core is idle. This idle wait time can be minimized by allowing the core to run another thread that is executed when the first thread is idle. This is called hyperthreading, and cores that use hyperthreading are viewed as having 2 logical processors.
    HyperthreadedCPU.png

    Task Manager view of hyperthreaded quad core processor

    In the above screenshot, there is one physical processor on the system’s motherboard. The one processor has 4 CPU cores, and the cores are hyperthreaded. The system displays this as 8 logical processors, but keep in mind that hyperthreading doesn’t double processor performance, it only takes advantage of the processor cycles that would otherwise be lost to idle time. In practice, the performance gain is closer to 30%.
    We’ll be looking at overall processor metrics (the _Total instance), which will range from 0 – 100, regardless of the number of logical processors on the system.

 

The volume of data generated that shows server performance and availability is quite substantial.  Ultimately it is about how to make the best use of that data.

This whitepaper helps IT achieve successful long-term server monitoring by focusing on an approach that is lightweight, efficient, resilient, and automated.

CPU Metrics

  • Processor Queue Length

    The System Processor Queue Length is the total number of threads waiting for access to a processor over all the logical processors on the system. A long processor queue can indicate that CPU requests exceed the system’s capacity or that threads with higher priorities are keeping lower priority threads from accessing the processor. To monitor this metric, create a baseline and alert on deviations from the baseline value.

  • % Privileged Time

    Threads run in either Privileged mode or User mode. Privileged mode, aka kernel mode, is used when a thread needs access to system resources, like hardware or memory, and functions as a measure of the amount of System CPU. If the server does not need access to protected system resources, then it executes in User mode, and threads can switch between Privileged and User modes as their requirements vary.

    In general, % Privileged Time should be < 30% of total CPU, but this value should be adjusted based on a baseline of your workload. If % Privileged Time is consistently high, you can investigate further by looking at the following counters:

    • % Interrupt Time

      Interrupts are higher priority threads used by the OS to handle hardware requests, and if % Interrupt Time > 20%, this can indicate a hardware or driver problem.

    • % DPC time

      DPC is a “deferred procedure call”, which is a hardware interrupt that runs at a lower priority. As with % Interrupt Time, if % DPC Time is > 20%, the issue is likely a hardware or driver problem.

    • Context Switches/Sec

      A context switch occurs when the CPU switches execution from one thread to another, either because the thread’s task has completed or because a higher priority thread interrupts a lower priority thread. There will always be context switches on a processor, but if too many high priority threads push themselves to the head of the queue then CPU performance will degrade. Baseline the Context Switches/Sec value for your system and look for spikes that could indicate interrupts from failing hardware or badly written drivers or software.

  • % Processor Time

    % Processor Time is the metric used for % CPU time in most cases. Technically % Processor Time is calculated as “100 – % Idle Time”, where % Idle Time is the percentage of time the processors are not processing threads. However, this works out in almost all cases to be the same as % Privileged Time + % Processor Time. Monitor % Processor Time and alert if it is consistently > 80%.

CPUSummaryReport.pngLongitude Summary CPU Report

Summary

While monitoring CPU use on your computers can ensure that your users have the resources they need, a deeper look at the metrics can help track down both hardware and software problems that drain CPU capacity. The following table summarizes key CPU metrics, their thresholds, and the problem that may be indicated when they exceed those thresholds.

Metric Threshold Problem Indicated
Processor Queue Length Exceeds baseline CPU requests exceed capacity or too many high priority threads
% Privileged CPU 30% System is using too much CPU
% Interrupt Time 20% Hardware or driver problem
% DPC Time 20% Hardware or driver problem
Context Switches/Sec Exceeds baseline Hardware, driver or software problem
% Processor Time 80% CPU requests exceed capacity.

 

Want to learn more?

Download our Best Practices for Server Monitoring Whitepaper and learn how to achieve a successful long-term server monitoring strategy by focusing on an approach that is lightweight, efficient, resilient, and automated.