Run Live Demo

What Kind of Disks Should You Use with Microsoft SQL Server 2016? (Part 1)

August 8, 2017 |

In addition to CPU and memory, Microsoft SQL Server 2016 also requires disks that are fast enough to keep up with the IO requests made by the database. Running SQL on disks that cannot keep up with the requisite level of IO results in latency that affects all subsequent operations.

In this post, we’ll look at:

  • SQL Server disk requirements
  • Disk Hardware
  • RAID configurations
  • Disk Management: DAS, NAS and SAN

SQL Server Disk Requirements

Microsoft SQL Server is designed so that it can recover its databases at any time in the event of a system failure. This is done by using a transaction log – whenever a database is updated, a transaction log entry recording the details of the update is written to disk before the database update is performed. If a system failure occurs, the transaction log entries can be used to roll back incomplete database updates and restore the database to a consistent state.

Using a transaction log means not just additional disk writes for database updates, but also that the writes to the log must be done before the database updates are committed to disk – and that introduces a latency into database updates.

 

Download Achieving Peak SQL Performance  –  Learn how an improved monitoring strategy can help optimize database performance and reduce the amount of dedicated resources needed for database monitoring and management

 

To address the need to write both data and log transactions, and do so quickly and reliably, SQL disks should have the following properties:

  • Multiple drives

    Putting logs and databases on separate disks allows the logs to be written independently from database updates, and scheduling can be optimized to minimize latency.
  • Fast write rates

    Disks should be fast enough to respond to write requests to keep the disk queue length low. The disk queue length is considered high when the average disk queue length value is greater than 2 times the number of disks in the array.  A high disk queue length will cause performance degradation in SQL, and may cause performance issues in the operating system.  

    If you’re using a Cloud based SQL Server instance, some storage options may have better write performance than others. For example, in Azure, the Premium Storage option will have lower latency and better I/O performance, but it is a more costly option.

  • Capacity

    Databases and log files need to be able to expand to meet demand. Logs and database files should be set to autogrow, but autogrowth can cause performance degradation as autogrow is implemented only when space is needed, and a latency will be introduced if SQL processes need to wait until autogrow completes.. To avoid performance degradation, monitor free space in files and manually grow them before they run out of space.
  • Reliability

    Hardware failures occur, but the failure rate for drives can vary. Historically, reliability was quantified by a mean time between failures (MTBF), which was the average number of power on hours expected of a drive before failure. A newer metric, the annualized failure rate (AFR) (the percent of failures/year for a drive) is now frequently used by manufacturers to quantify reliability.

    Another aspect of reliability is whether the drive is Enterprise class or Consumer class. Enterprise class drives are designed for 24/7 use, and are able to withstand higher temperatures. Consumer class drives are not designed for constant use, and have lower temperature thresholds.

  • Redundancy

    Using drives configured in redundant arrays can compensate for a disk failure, but redundancy itself can incur a performance penalty. Microsoft recommends RAID 10 (mirrored striped sets) for disks with high read and write rates, while RAID 5 (striped sets with parity) is suitable for databases with read-only data. See RAID configurations below for more details on RAID levels.

Disk Hardware

Hard disk drives (HDD) have been around since the 1950’s, and have made regular improvements in speed, reliability and capacity over the last 6 decades. Solid-state drives (SSD) were first manufactured in 1976, but have only been available at capacities and prices suitable for server use in the last decade.

SSDs are faster, sturdier, and use less energy than HDDs, but the technology has a not undeserved reputation for being error prone. A long term study published in 2016 on Flash Reliability in Production found that the replacement rate for SSDs was less than that for HDDs, but that SSDs were significantly more prone to uncorrectable errors. However SSD warranties are stretching out to 3-5 years as manufacturers improve reliability, making them a more viable option for enterprise level systems.

To improve the reliability of SSDs, use Enterprise Class drives, which are designed for continuous use at harsher conditions than Consumer Class drives. While there are some reports that Consumer class drives are sufficient for datacenter use, the consensus is that Enterprise class provides significantly better IO performance.

 

Try Longitude Live Online Demo!

Access our online demo environment, see how to set up SQL Server monitoring, view dashboards, problem events, reports and alerts.  Please log in using the credentials below:

  • Username:   demo
  • Password:    longitude

 

RAID configurations

Another option for improving not only performance, but also reliability is to use a “redundant array of independent disks” (RAID). RAID can provide redundancy by writing the same data over multiple disks (aka mirroring), and can improve performance by breaking write data into blocks and “striping” the writes across multiple physical disks, increasing the amount of data that can be written at one time.

For SQL, the most typical RAID configurations are:

RAID 0: Striped Sets
Disk0
Block(0)
Block(2)
Disk1
Block(1)
Block(3)

Data is striped across 2 or more disks, which improves performance but has no redundancy. If one disk fails, the data in the array is lost. In the example above, we have a data set written to the RAID array, broken into 4 blocks. Since both Disk0 and Disk1 can write simultaneously, the data is written to disk faster, but if one disk fails, the array fails.

 

RAID 1: Mirrored Drives
Disk0
Block(0)
Block(1)
Block(2)
Block(3)
Disk1
Block(0)
Block(1)
Block(2)
Block(3)

Data is mirrored across 2 or more disks. If one disk fails, a complete record of the data is on the other disks in the mirror set. However, you have 50% less disk capacity because the mirrored copies are not available for other data.

RAID 5: Striped Sets with Parity
Disk0
Block(0)
Block(2)
Block(4/5).p
Disk1
Block(1)
Block(2/3).p
Block(4)
Disk2
Block(0/1).p
Block(3)
Block(5)

RAID 5 adds redundancy to striping, requires a minimum of 3 disks, and can survive the failure of 1 disk. The striping in RAID 5 allows multiple drives to be used to write data, but an additional block of parity information must be written to recover data if a disk fails (.p in the above diagram). For example, if Disk1 were to fail, Block(1) and Block(4) would be reconstructed using the parity information on Disk0 and Disk2.

Although RAID 5 uses striping, note that both calculating the parity and writing the parity block to disk slows down write performance in RAID 5, so it will be slower than RAID 0.

RAID 6: Striped Sets with Dual Parity
Disk0
Block(0)
Block(2)
Block(4/5).p
Disk1
Block(1)
Block(2/3).p
Block(4/5).q
Disk2
Block(0/1).p
Block(2/3).q
Block(4)
Disk3
Block(0/1).q
Block(3)
Block(5)

RAID 6 is RAID 5 with a second parity block which allows for the failure of up to 2 drives in the array. The second parity block is denoted by a “.q” in the above diagram. The resources needed to calculate and write the additional parity block slow write performance even more than in RAID 5, but data is protected from up to 2 disks failing.

RAID 10: Mirrored Striped Sets
Disk0
Block(0)
Block(2)
Disk1
Block(0)
Block(2)
Disk2
Block(1)
Block(3)
Disk3
Block(1)
Block(3)

RAID 10 is a mirrored set of RAID 0 drives. RAID 10 requires a minimum of 4 drives, and can survive the failure of 1 drive. Writes have the performance improvements from RAID 0, along with redundancy from being in a mirror set. Once again, note that the redundancy will mean that you can use only 50% of total disk space.

Nested RAID arrays

Other nested RAID configurations exist – for example, RAID 100, which is striped arrays of RAID 10. Adding mirroring increases fault tolerance, but drops the overall percentage of usable space, while adding striping increases the array’s speed.


SQL RAID recommendations

  • Use RAID 10 for high IO databases when possible, but keep in mind that 50% of the total disk space is used for mirroring.
  • RAID 5 / RAID 6 have slower write performance, but fast read performance, and are suitable for read-only databases.
  • Logs should be written to separate RAID arrays from data files.
  • For any RAID configuration, verify that the array’s IO performance and capacity meet your requirements.

Disk Management: DAS, NAS and SAN

  • Direct Attached Storage (DAS)

    If SQL Server is running on a server with a RAID controller, DAS is a fast, easy to use option for SQL storage. However, DAS can be limited in terms of capacity and available RAID configuration.
  • Network Attached Storage (NAS)

    NAS devices usually contain one or more RAID arrays which are accessed remotely over TCP/IP. Rather than accessing data at a block level, data on a NAS is accessed at the file level – and SQL Server is designed to access blocks rather than files.

    Before SQL Server 2008 R2 a startup flag needed to be set for SQL Server to use a NAS, but since then SQL has been able to access NAS files out of the box via UNC paths. However, there is debate as to the reliability of SQL Server database files on NAS devices, and the general consensus is that NAS should be avoided for SQL Server files if possible.

  • Storage Area Network (SAN)

    SAN devices can be configured to contain one or more RAID arrays and are usually attached through a Fibre Channel, Fibre Channel over Ethernet, or iSCSI connection. SAN appears to the SQL Server as if it were DAS, allowing block level access, but it does not have the capacity and RAID configuration limitations of DAS.

Conclusion

While the ability to support high IO rates is one feature required for SQL storage, the ability to recover from a disk failure, and how SQL accesses storage are important features as well. The optimal storage configuration for your SQL server will involve selecting hardware, RAID configuration, and storage devices to meet your databases IO and recovery requirements.

In our next post, we’ll look at tools you can use to evaluate the performance of the storage devices you’ve selected, and the PerfMon disk counters you need to monitor to optimize SQL performance.

Want learn more?

Learn how an improved monitoring strategy can help optimize database performance and reduce the amount of dedicated resources needed for database monitoring and management.

 

Improve SQL Server Performance

Learn how an improved SQL Server monitoring strategy can help optimize database performance.

Reduce the amount of dedicated resources needed for database monitoring and management.