In our previous post we looked at disk requirements for Microsoft SQL Server 2016 and outlined available disk technologies, including hardware, RAID configurations, and storage devices. In this post we’ll look at:
- Manufacturer’s disk performance specifications
- RAID performance specifications
- Disk Connection Protocols: SATA, SAS and NVMe
- Estimating Disk Performance
- Disk key performance indicators (KPIs)
Manufacturer’s Disk Performance Specifications
Manufacturers provide disk drive specifications for the speed at which they estimate the drive will be able to process data. When evaluating specifications, keep in mind the following:
Latency is the amount of time needed to read or write data on disk. Latency for hard disk drives (HDDs) includes the time needed to move the read/write head to the location on disk where the data is located. This is a combination of “seek time” (the average time to get to the correct concentric track on the disk), and “rotational latency” (how long it takes for the disk to spin the disk to the correct location on the track).
Latency for HDDs typically ranges in 10s of milliseconds. Since SSDs have no moving parts they do not incur seek time or rotational latencies, and their latency is usually < 1 ms.
- Input/Output Operations per second (IOPS)
IOPS is the number of read or write operations a disk is estimated to be able to support per second. For HDDs, this will be influenced by the disk’s rotational speed for example, a 15,000 RPM HDD can range up to several hundred IOPS. SSD, however, support IOPS in the hundreds of thousands, with newer technology pushing the rate even higher.
Throughput is the estimated rate at which data can be read or written, usually given in units of MB/s. Throughput is related to IOPS, but includes factors such as whether data is sequentially located on randomly located, and whether data is being read or written.
- Sequential vs Random
Data that is located on sequential sections of the disk can be processed more quickly than data that is in random locations, and specifications should be provided for both sequential and random data throughput. The effect is more pronounced for HDDs, due to the added seek time and rotational latency for random reads and writes, but there is also additional overhead for locating random access data that affects SSDs as well.
Sequential data read and write speed is usually provided in terms of MBps (MegaBytes per second), while random read and write speed is provided in terms of IOPS (IO per second). IO operations process 4 KB data blocks, so 256 IOPS = 1 MBPS.
- Read vs Write
Since writes need to alter the media, write performance will be slower than read performance, and you should obtain separate IOPS or MBps metrics for both read and write. Note that RAID levels that add parity (e.g. RAID 5 or 6) increase the amount of data that needs to be written to disk and will provide slower write performance, while increases in the number of disks used to stripe data will speed up both read and write operations.
As we mentioned in the previous post, manufacturers provide estimated mean time before failure (MTBF) ratings and/or annual failure rates (AFR).
RAID Performance Specifications
RAID arrays can improve disk performance and reliability. The more disks data is striped across in an array, the faster it will be to read or write from that array. However, adding in mirroring or parity for data redundancy decreases data write speed. The following table provides formulas to determine relative write performance and the amount of available capacity at standard RAID levels:
N = # drives
X = IOPS
|RAID level||Write Performance||Available Capacity|
|RAID 5||NX/4||(N-1)/N * 100|
|RAID 6||NX/6||(N-2)/N * 100|
Due to reliability concerns, RAID 5 has been deprecated for use in new systems in favor of RAID 6, but in general RAID 10 is the recommended configuration for new servers.
Note: If RAID 10 is set up with twice as many disks as RAID 0, such that the capacities are equal, then the write performance for RAID 10 will still equal that of RAID 0 however, because the same data is available on multiple drives, read performance increases.
Disk Connection Protocols: SATA, SAS and NVMe
As disk maximum throughput has increased, especially for SSDs, some are reaching their upper limits due to the protocol that connects them to the server. The current connection protocols are:
Serial Advanced Technology Attachment (SATA) 1.0 was introduced in 2003, and is currently at version 3.3. SATA is used extensively in consumer level computers, as SATA HDDs tend to be slower, have shorter lifespans, and be less expensive than other drives, but SSDs can also be used with SATA. The SATA 3 standard supports throughput up to 6 Gbit/s, and there are currently no plans to improve upon that speed.
Serial Attached SCSI (SAS) was introduced in 2005, and the current SAS-3 version supports up to 12 Gbit/s, with SAS-4 expected to be released in 2017, supporting 22.5 Gbit/s. SAS can be used with both HDDs and SSDs, and is used for servers and RAID arrays.
Non-Volatile Memory Express (NVMe) is designed specifically for SSD drives. Both SATA and SAS were designed to work with disks that had read/write heads, and read/write heads process data serially, stacking commands into a queue if additional commands are received before the current one is completed.
However, SSD doesn’t have read/write heads, and isn’t limited to processing commands serially. NVMe takes advantage of SSD architecture by supporting up to 65,536 IO queues, each of which can support up to 64K queued commands.
NVMe SSD drives connect to the motherboard using the PCI Express (PCIe) bus. The PCIe bus is connected to the system via lanes, with each lane having receive and send signalling pairs, providing a full duplex data transfer (e.g. send and receive have the same bandwidth). Devices connected to PCIe can increase their throughput by using multiple lanes, with standard configurations of 1, 4, 8 or 16 lanes. The more lanes used to connect a drive to the PCIe bus, and the newer the version of PCIe, the faster the data throughput:
Number of lanes PCIe Version 1 lane (x1) 4 lanes (x4) 8 lanes (x8) 16 lanes (x16) 1.0 (Gen1) 250 MB/s 1 GB/s 2 GB/s 4 GB/s 2.0 (Gen2) 500 MB/s 2 GB/s 4 GB/s 8 GB/s 3.0 (Gen3) 984.6 MB/s 3.94 GB/s 7.9 GB/s 15.8 GB/s 4.0 (Gen4) 1969 MB/s 7.9 GB/s 15.8 GB/s 31.5 GB/s 5.0 (Gen5)
~3077 MB/s 12.3 GB/s 24.6 GB/s 49.2 GB/s
Currently, most PCIe NVMe SSDs are PCIe Gen3 x4 (i.e. PCIe version 3.0 using 4 lanes).
The performance gains from using SSDs with NVMe can be increased further by configuring them to use RAID, but as these drives can be significantly more expensive than HDDs or SATA SSDs, you should check your disk performance to verify if the additional speed is required.
Estimating Disk Performance
The speed of the storage disks used by SQL is influenced by the factors we’ve previously described - manufacturer’s estimated speed, RAID configuration and device connection protocol (SATA, SAS or NVMe). To evaluate the final disk configuration, Microsoft recommends the diskspd utility.
Diskspd provides the ability to run tests modeling SQL Server workloads and to evaluate storage disk performance. SQL specific recommended flags for the commands include using 8k blocks and running tests for at least 60 seconds. Adjusting and rerunning tests can provide a baseline for predicted storage performance under load. Run tests on both existing and new disks to evaluate the predicted improvement in performance.
The following KPIs can be used to create a baseline for SQL server performance - these counters are found in both Logical Disk and Physical Disk and apply to both HDDs and SDDs:
- Average Disk sec/Read
Best performance at < 8 ms, and alert if > 20ms.
- Disk Read Bytes/sec
Baseline this value, and check value against baseline, especially if Average Disk sec/Read is too high.
- Average Disk sec/Write
If Writes are not cached, then the thresholds are the same as for Read: best performance at < 8 ms, and alert if > 20ms. If writes are cached, then best performance is < 1 ms, and alert if > 4 ms.
- Disk Write Bytes/sec
Baseline this value, and check value against baseline, especially if Average Disk sec/Write is too high.
- Average Disk sec/Transfer
Transfers are both Reads and Writes. If buffering is not used, then the thresholds of good performance < 4 ms, and alert if > 20 ms apply. If buffering is used, baseline this value and alert when the value deviates from the baseline.
- Current Disk Queue Length, Average Disk Queue Length
The disk queue should be less than 2 per each drive in the disk array for both Current and Average, so a RAID 10 array with 4 drives should have a disk queue length less than or equal to 8. However, data cached for writing can artificially increase the queue length, so if caching is in effect, baseline these values and alert if it deviates from the baseline.
- % Disk Time
This value indicates how busy the disks are servicing read and write requests. It can be > 100% if the Disk Queue length is high or if you are using an array that has multiple drives. Alert if this value is > 90% per disk.
- % Free Space
Monitor how quickly your database and logs are consuming space, and set a threshold that will provide enough time to add more disk space before the drives run out. In general, 15% free space is a good threshold, but this can be too high for very large drives, in which case you may want to use a threshold with a set number of MB or GB free space.
Developments in disk technology have increased IO speed, capacity, and reliability, but the fastest technologies can be much more expensive and require hardware upgrades. Before investing in hardware upgrades and faster disks, evaluate the performance of SQL your current storage media, and then use diskspd to baseline the performance of both the existing storage and any storage upgrades.