Administering even a single instance of MS SQL can be challenging, let alone a series of linked servers. While MS SQL is a powerful tool for data management, that potential power also comes with a great deal of complexity. Every IT admin worth their salt will certainly need to closely monitor the MS SQL instances they are watching over, but the sheer volume of potential issues and corresponding error messages that are generated can be exceptionally overwhelming.
In this article we'll examine the most critical error messages presented by an MS SQL instance, from how to identify the possibilities that will occur to how those messages will propagate into the local Windows Event Log.
Evaluating Your Environment's Potential Errors
With MS SQL being a dominant database backend for over 25 years now, the number of versions of SQL Server that exist on the market are numerous, so it may be important to learn how to properly identify the critical error messages that are possible for the particular installed version you are working with. Thankfully, MS SQL itself makes this task quite easy with some simple queries.
Begin by connecting to a relevant database you wish to check then run the following query:
WHERE msglangid = 1033
ORDER BY severity, description;
This will display the entire list of system messages available filtering the list to only display messages that are localized in US English (msglangid = 1033). If you are using a different localization you can find the appropriate msglangid code by executing the following query and replacing the above msglangid to match your own language.
SELECT name, alias, msglangid
Now with the full list of potential MS SQL messages at our fingertips, we need to narrow this down from all possible messages to only errors. From there we can decide which errors are the most critical and require monitoring or alerts.
MS SQL Message Severity Levels
Due to the sheer volume of possible error types that MS SQL can report, MS SQL assigns all errors with a numeric severity value indicating how critical, or severe, the error message is. The MS SQL documentation suggests that all "error messages with severity levels 17 and higher" should be dealt with by an administrator. You may find the full details on what each severity level 17+ means from the above URL, but below are the basic descriptions to give a brief overview:
- 17: Insufficient Resources
- 18: Nonfatal Internal Error Detected
- 19: SQL Server Error in Resource
- 20: SQL Server Fatal Error in Current Process
- 21: SQL Server Fatal Error in Database (dbid) Processes
- 22: SQL Server Fatal Error Table Integrity Suspect
- 23: SQL Server Fatal Error: Database Integrity Suspect
- 24: Hardware Error
It is worth noting that while some older versions of MS SQL will produce errors with a severity level of 25 most modern instances will max out at a severity level of 24.
Regardless, it's important to note that every message with a severity of 17 - 19 should be monitored at the very least, while severity levels of 20+ indicate a fatal system error in which the current process that was executing has halted. As seen above the particular reason for a fatal error can vary but in all cases the error is written to the error log and must be alerted to an administrator immediately for resolution.
Viewing Fatal Errors
With a slight modification to our previous query we can now take a look at the list of fatal errors that should be monitored and generate an administrative alert:
WHERE msglangid = 1033
AND severity >= 20
ORDER BY severity DESC, description;
This lists fatal errors in roughly the order of severity beginning with hardware failures and other messages that must be addressed immediately.
Best of all, the error column that indicates the specific error number can be matched up to messages found in the appropriate Windows Application Event Log under the Event ID column, allowing you to easily compare events from the Event Log to errors that MS SQL may be throwing.
While the number of potential errors that MS SQL can produce can be high, adding some basic filters to monitor and alert for errors above a particular severity level, you can be sure to stay on top of everything that is happening on your server.
Want to learn more?
Download a FREE trial of Longitude - Stand up Longitude in just minutes and immediately start seeing how your SQL environment is performing, receive proactive alerts, reveal root causes, automate corrective actions, and more...