Blog

Linux Memory Use

May 04, 2017 | Susan Bilder

Monitoring memory on Linux can be counter-intuitive if you’re accustomed to monitoring Windows.  On Windows, 95% used memory means that 95% of your memory is assigned to processes, and allocating memory for a new process will slow down performance as memory is swapped out to disk to free up enough memory for the new process.

For Linux, memory that is listed as “used” is made up of a combination of memory assigned to processes and memory used by the operating system for caching data and buffering operations. Buffer and cache data can be dropped without needing to be swapped out to disk, so the actual memory available for processes includes the buffer/cache memory pages.

In this post we will outline Linux memory management for processes and look at commands to monitor memory usage. 

Linux process memory management:

When a process is started on Linux, it is assigned memory pages to hold executable code and data read in from disk.  Pages are added and removed dynamically to accommodate memory needed for process execution and data input. Memory pages are usually 4 KB, but you can use getconf PAGESIZE to find the default page size on your system.

The Linux memory manager uses the following mechanisms to handle memory for processes:

  • MAJOR PAGE FAULT
    A major page fault occurs when a process needs to read in data from disk to memory pages.  Major page faults are expected when a prdocess starts or needs to read in additional data and in these cases do not indicate a problem condition.  However, a major page fault can also be the result of reading memory pages that have been written out to the swap file, which could indicate a memory shortage.

  • MINOR PAGE FAULT
    A minor page fault occurs when a process needs data that is in memory and is assigned to another process.  Minor page faults share memory pages between multiple processes - no additional data needs to be read from disk to memory.
  • COPY-ON-WRITE
    A copy-on-write occurs for a shared memory page when one of the processes sharing it needs to modify that page. The shared page is copied and assigned to the modifying process.

  • SWAPPING
    Technically, swapping refers to paging all the memory for a process out to disk, but the term is often used synonymously with paging inactive memory out to the swapfile. If a process is swapped in and out of memory frequently, this can lead to thrashing, where disk I/O can severely degrade performance.

  • RESIDENT SET and WORKING SET
    The total memory page faulted in for a process is called its RESIDENT SET, and the WORKING SET for a process is the set of those pages actively in use.  As a process continues to execute, the memory manager dynamically updates the resident set to page fault in new data as required and to identify existing resident pages that have not been recently used in the working set.  The memory manager uses an algorithm to determine when idle resident set pages can be added to a free list to be reclaimed for other processes. 

  • PAGE OUT The memory manager monitors the resident set of a process for idle pages and removes them from memory as needed.

Monitoring Memory

The following files and commands provide information on memory on Linux:

  • free -k
    The free command provides a snapshot of Linux system memory - the -k provides output in kilobytes.  On a RHEL/CentOS 7 system, the output looks like:

    free -k output on CentOS 7

    In this output on the Mem line:
    total = used + free + buff/cache

    Technically, the memory available to start processes is free + buff/cache, but some of that cache memory is in use by active processes, and using it would affect performance.  The available value takes into account the in-use cache memory and provides an estimate for memory that can be used immediately without affecting actively used cache memory.

    For earlier RHEL 5 and 6, free -k output has the format:

    free -k output for earlier Linux releases

    In this output, the second line provides the used and free memory not counting the buffers and cache.  The total memory is the sum of the used + free columns on either line 1 and line 2, but the values in the -+ buffers/cache line are the used and free memory adjusted for the cache and buffers:

    usedline 2  = usedline 1  - buffersline 1  - cachedline 1
    freeline 2  = freeline 1  + buffersline 1 + cachedline 1  


  • /proc/meminfo
    The /proc directory contains directories and files that provide detailed configuration and performance metrics on the OS and processes.  The /proc/meminfo file contains metrics on operating system memory use:

    contents of system level memory metrics in /proc/meminfo


    The fields in /proc/meminfo match up to the free -k output, but provide a more detailed breakdown of how used, free and cache memory are being used. /proc/meminfo also includes total Active memory, equivalent to the sum of the processes’ working sets, and Inactive memory, which can be scanned by the memory manager for pages that are eligible to be reclaimed.


  • /proc/{pid}/status
    The /proc directory has a subdirectory for each process on the system, named with the PID for the process. The files in each directory provide detailed performance metrics for the corresponding process, with the /proc/{pid}/status file having the most user friendly format:

    process level performance metrics for pid = 1213 from /proc/1213/status

    For the purpose of monitoring process memory, the VmRSS field is the current resident set of the process, the VmHWM is the maximum resident set (HWM = “high water mark”), and the VmSwap is the amount of memory that has been paged out for the process.


  • sar -B
    The sar command, provided by the sysstat package, reports on system activity, and the -B flag produces paging metrics.  The command can take arguments for the number of samples and seconds between samples, with the output providing rate metrics for the interval.  The first line of output is metrics since boot, and the subsequent lines are over the specified interval.  For a “sar -B 2 2” command, the output looks like:

    output from sar -B 2 2

    This output is a snapshot of data being paged into or out of memory and at how quickly the memory manager is scanning for idle pages eligible to be placed on a free list of pages that can be reclaimed as needed.  

    The metrics returned by sar -B  are:  

    pgpgin/s KB read in to memory/sec
    pgpgout/s KB paged out to disk/sec
    fault/s Total major and minor faults/sec
    majflt/s Major faults/sec.  This is expected to increase when processes start, but should drop back down after the process has loaded into memory.
    pgfree/s Number of pages placed on the free list per second.  These pages are available to be paged out if memory is needed for another process.
    pgscank/s The kswapd daemon scans for available pages for the free list if the number of pages on the list becomes too low.  This daemon only runs when needed.
    pgscand/s Memory pages scanned directly.
    pgsteal/s Number of pages recovered from cache per second to meet memory needs.
    %vmeff Page reclaim efficiency:  pgsteal/(pgscand + pgscank).  If low (e.g. < 30%), may indicate virtual memory problems.  This should either be near 100% or 0 if no pages have been scanned during the interval.


  • vmstat
    vmstat, from the sysstat package, has two arguments that will return memory information.  vmstat -s  returns event counters and memory statistics:

    output from vmstat -s

    The values for total, used and free memory are the same values seen in /proc/meminfo.

    The default vmstat output returns the following: In the memory section, the values are:

    output from vmstat


    swpd Total memory currently swapped to disk.
    free Free physical memory
    inact Inactive used memory
    active Active used memory.

     
  • top
    top is a live view of activity on the system, similar to Task Manager on Windows.  Above the list of processes is  header containing OS performance information, with memory metrics providing the same values as free -k:

    output from top

    To sort the list of processes by memory, you can run “top -o %MEM”, or type an uppercase M after top has started.  To change the column used for sorting, you can type < or > to move the sort column left or right.   
Conclusion:

Because Linux uses free memory for buffers and caches, monitoring for memory problems is more complex than simply looking at the free memory value on the system.  The following table lists a few basic metrics that can be used to detect memory problems and provides guidelines for monitoring them.  

The usual disclaimers apply:  the exact thresholds for a system will depend on the specific processes and OS version, so the best practice is to start with an initial threshold, monitor system performance, and then adjust that threshold as needed to detect significant deviations from your baseline.  And if you’re seeing performance problems with a specific process but don’t see problems with metrics at the system level, take a closer look at the process specific metrics found in top or /proc/{pid}/status.  

Metric: Found In: Initial Threshold Guideline:
Available Memory,  or
free + buffers + cache
free -k, or 
top, or
/proc/meminfo
Available Memory >= 5% total memory Create baseline for system and adjust threshold based on observed performance.
%vmeff sar -B > 30% Ideally should be near 100% or read 0 if there is no paging.
pgpgout/s sar -B <= 500 Create baseline for system and adjust threshold based on observed performance.

 

Want to learn more?

Download our Best Practices for Server Monitoring Whitepaper and learn how to achieve a successful long-term server monitoring strategy by focusing on an approach that is lightweight, efficient, resilient, and automated.

 

Download the whitepaper: Best Practices for Server Monitoring