Inaccurate SQL Server statistics - a SQL query performance killer – the basics

SQLShack

SQL Server training Español

Inaccurate SQL Server statistics – a SQL query performance killer – the basics

April 22, 2014 by Milena Petrovic SQL Server statistics are used for cost-based SQL query optimization. Cost-based optimizations estimate costs of various SQL query execution plans applicable to the query and select the one that uses the least hardware resources, i.e. has the smallest cost.

What are SQL Server statistics

SQL Server statistics are a collection of distinct values in a specific table column or columns, collected by SQL Server by sampling table data. SQL Server statistics are created automatically by Query Optimizer for indexes on tables or views when the index is created. Usually, additional statistics are not needed, nor the existing ones require modification to achieve best performance. Data sampling depends on a number of table rows and the type of information stored. For example, let’s look at a table that stores uninstallation information. The column where the uninstallation reason is saved is filled by selecting one of the options offered in the drop-down list: ‘doesn’t work as expected’, ‘bugs found during evaluation’, or ‘trial expired’. In other words, there is a limited number of possible values. Sampling this column is quick, easy, and doesn’t require a large percentage of table data to be sampled. If there are 1,000 uninstallations logged, sampling 10 to 20 records (1 to 2 percent) will provide the correct statistics, i.e. the correct number of the distinct uninstallation reasons. If you leave a possibility to enter an optional uninstallation reason, besides these three distinct values, it is possible that users will type the uninstallation reason, such as: too expensive, too complicated to use, etc. Let’s say that 900 out of 1,000 uninstallation reasons were selected from the drop-down list, and for the remaining 100 uninstallations, the users entered custom reasons where no two reasons are the same. That means that there are 103 distinct uninstallation reasons in the table. Sampling 1 percent of the uninstallation records will not provide the correct statistics. It will show maximally 10 distinct values, while in fact there are 103 distinct values. Therefore, the SQL Server statistics will be inaccurate if the same sampling percentage is used.

How SQL Statistics affect SQL Server performance

One of the parameters used in query optimization and selecting an optimal query execution plan is how unique the data is. This information is provided by SQL Server statistics. If incorrect statistics are used, SQL Server will use wrong estimates when selecting an execution plan and select a plan that needs a lot of time to be executed. On the other hand, when the estimated number of distinct values is correct, query execution plans chosen on these estimates will perform well. That’s why it is important to have up-to-date information on data distribution in table columns. Statistics are created on index columns, but also on non-index columns that are used in the query predicates (WHERE, FROM, or HAVING clauses). SQL Server statistics become inaccurate when databases are in use and many transactions occur. A typical symptom of inaccurate statistics is a query that runs well and then, without any obvious reasons, becomes very slow. The problem troubleshooting starts with analyzing the slow query. If the difference between the estimated and actual number of rows in a query execution plan is higher than 10%, the statistics are obsolete. How the performance will be affected depends on the query and execution plan. The same obsolete statistics can have different effect on two different queries. There is no dynamic management view that can indicate inaccurate statistics. We’ll show the methods for working with SQL Server statistics that can help you determine whether the statistics are obsolete or not.

Working with SQL Server statistics

SQL Server statistics are shown in SQL Server Management Studio Object Explorer, in the Statistics node for the specific table or view. Note that each index shown in the Indexes node has a corresponding SQL Server statistics. Double-clicking the statistics opens the SQL Server statistics properties. Besides seeing SQL Server statistics for the specific index column, this dialog enables statistics modification by adding and removing the statistics columns (which is recommended only for advanced users) and updating the statistics. The Details tab shows more detailed info. All density is calculated as 1/total number of distinct rows. In this example, where the statistics is created for the identity AddressID column, all column values are distinct, so the number of table rows is equal to the number of distinct rows, and the density is:
1/19,614 = 0.00005098099 = 5.098399102681758e-5 Average length is shown in bytes and it represents the space needed to store a list of the column values. RANGE_HI_KEY shows the upper bound column value for a histogram step. RANGE_ROWS shows an estimated number of rows for which the value falls within a histogram step. In this example, there are zero rows that have key value lower than 1; 1,094 rows with values between 1 and 1,096; 127 rows with values between 11,510 and 1,096, etc. For testing purposes, you can use a query such as 1234 SELECT Count (*) FROM Person.AddressWHERE AddressID < 11510 and AddressID > 1096 The information shown in the Statistics Properties dialog can also be obtained using the DBCC SHOW_STATISTICS command. 123 DBCC SHOW_STATISTICS ("Person.Address", PK_Address_AddressID); As already explained, Query Analyzer automatically creates index statistics when a table or view index is created. To create statistics for a non-index column used in a query predicate, make sure that the database AUTO_CREATE_STATISTICS option to on. The default option value is True In SQL Server Management Studio: In Object Explorer right-click the database In the context menu, select Properties Open the Options tab In the Automatic section, change the Auto Create Statistics option value The same can be done using T-SQL 1234 ALTER DATABASE AdventureWorks SET AUTO_CREATE_STATISTICS ON sp_autostats is a stored procedure that shows the automatic statistics update parameter value, for a table, index, statistics object, or indexed view. 123 EXEC sp_autostats [Person.Address] The information shown is the same as in the Statistics Properties dialog shown above. The same stored procedure can also be used to change the automatic statistics update parameter value. To disable the AUTO_UPDATE_STATISTICS option for all statistics on the Person.Address table execute. 123 EXEC sp_autostats 'Person.Address', 'OFF' The result is the same as when using SET AUTO_CREATE_STATISTICS ON, as shown in the example above. The Stats_Date function shows the date of the most recent statistics update. In the following example, we will use it on the records obtained from the sys.stats catalog view that contains a row for every SQL Server statistics in the database. Querying just the sys.stats view doesn’t return useful information. 1234 SELECT * FROM sys.stats Using the Stats_Date function on the specific table or view object ID provides the date and time of the most recent statistics update. 12345 SELECT name, STATS_DATE(object_id, stats_id) as LastUpdatedFROM sys.stats WHERE object_id = OBJECT_ID('Person.Address'); Although this information looks more useful at a first glance, it still doesn’t tell much. It doesn’t show whether the statistics are obsolete or not. The same last date can indicate valid statistics for some tables and obsolete for others. It depends on data changes that occurred after the last statistics update. In this article, we explained what SQL Server statistics were, why and how they affect SQL Server performance, and how to see, modify, or update them. In the next part of this article, we will give recommendations for preventing SQL Server performance problems caused by inaccurate SQL Server statistics. Author Recent Posts Milena PetrovicMilena is a SQL Server professional with more than 20 years of experience in IT. She has started with computer programming in high school and continued at University.

She has been working with SQL Server since 2005 and has experience with SQL 2000 through SQL 2014.

Her favorite SQL Server topics are SQL Server disaster recovery, auditing, and performance monitoring.

View all posts by Milena "Millie" Petrovic Latest posts by Milena Petrovic (see all) Using custom reports to improve performance reporting in SQL Server 2014 – running and modifying the reports - September 12, 2014 Using custom reports to improve performance reporting in SQL Server 2014 – the basics - September 8, 2014 Performance Dashboard Reports in SQL Server 2014 - July 29, 2014

SQL Server Statistics and how to perform Update Statistics in SQL Designing effective SQL Server non-clustered indexes Top SQL Server Books Inaccurate SQL Server statistics – a SQL query performance killer – updating SQL Server statistics Working with different SQL Server indexes types 13,936 Views

SQL Convert Date functions and formats SQL Variables: Basics and usage SQL PARTITION BY Clause overview Different ways to SQL delete duplicate rows from a SQL Table How to UPDATE from a SELECT statement in SQL Server SQL Server functions for converting a String to a Date SELECT INTO TEMP TABLE statement in SQL Server SQL WHILE loop with simple examples How to backup and restore MySQL databases using the mysqldump command CASE statement in SQL Overview of SQL RANK functions Understanding the SQL MERGE statement INSERT INTO SELECT statement overview and examples SQL multiple joins for beginners with examples Understanding the SQL Decimal data type DELETE CASCADE and UPDATE CASCADE in SQL Server foreign key SQL Not Equal Operator introduction and examples SQL CROSS JOIN with examples The Table Variable in SQL Server SQL Server table hints – WITH (NOLOCK) best practices

SQL Server Transaction Log Backup, Truncate and Shrink Operations Six different methods to copy tables between databases in SQL Server How to implement error handling in SQL Server Working with the SQL Server command line (sqlcmd) Methods to avoid the SQL divide by zero error Query optimization techniques in SQL Server: tips and tricks How to create and configure a linked server in SQL Server Management Studio SQL replace: How to replace ASCII special characters in SQL Server How to identify slow running queries in SQL Server SQL varchar data type deep dive How to implement array-like functionality in SQL Server All about locking in SQL Server SQL Server stored procedures for beginners Database table partitioning in SQL Server How to drop temp tables in SQL Server How to determine free space and file size for SQL Server databases Using PowerShell to split a string into an array KILL SPID command in SQL Server How to install SQL Server Express edition SQL Union overview, usage and examples

Solutions

Read a SQL Server transaction logSQL Server database auditing techniquesHow to recover SQL Server data from accidental UPDATE and DELETE operationsHow to quickly search for SQL database data and objectsSynchronize SQL Server databases in different remote sourcesRecover SQL data from a dropped table without backupsHow to restore specific table(s) from a SQL Server database backupRecover deleted SQL data from transaction logsHow to recover SQL Server data from accidental updates without backupsAutomatically compare and synchronize SQL Server dataOpen LDF file and view LDF file contentQuickly convert SQL code to language-specific client codeHow to recover a single table from a SQL Server database backupRecover data lost due to a TRUNCATE operation without backupsHow to recover SQL Server data from accidental DELETE, TRUNCATE and DROP operationsReverting your SQL Server database back to a specific point in timeHow to create SSIS package documentationMigrate a SQL Server database to a newer version of SQL ServerHow to restore a SQL Server database backup to an older version of SQL Server

Categories and tips

►Auditing and compliance (50) Auditing (40) Data classification (1) Data masking (9) Azure (295) Azure Data Studio (46) Backup and restore (108) ►Business Intelligence (482) Analysis Services (SSAS) (47) Biml (10) Data Mining (14) Data Quality Services (4) Data Tools (SSDT) (13) Data Warehouse (16) Excel (20) General (39) Integration Services (SSIS) (125) Master Data Services (6) OLAP cube (15) PowerBI (95) Reporting Services (SSRS) (67) Data science (21) ►Database design (233) Clustering (16) Common Table Expressions (CTE) (11) Concurrency (1) Constraints (8) Data types (11) FILESTREAM (22) General database design (104) Partitioning (13) Relationships and dependencies (12) Temporal tables (12) Views (16) ►Database development (418) Comparison (4) Continuous delivery (CD) (5) Continuous integration (CI) (11) Development (146) Functions (106) Hyper-V (1) Search (10) Source Control (15) SQL unit testing (23) Stored procedures (34) String Concatenation (2) Synonyms (1) Team Explorer (2) Testing (35) Visual Studio (14) DBAtools (35) DevOps (23) DevSecOps (2) Documentation (22) ETL (76) ►Features (213) Adaptive query processing (11) Bulk insert (16) Database mail (10) DBCC (7) Experimentation Assistant (DEA) (3) High Availability (36) Query store (10) Replication (40) Transaction log (59) Transparent Data Encryption (TDE) (21) Importing, exporting (51) Installation, setup and configuration (121) Jobs (42) ►Languages and coding (686) Cursors (9) DDL (9) DML (6) JSON (17) PowerShell (77) Python (37) R (16) SQL commands (196) SQLCMD (7) String functions (21) T-SQL (275) XML (15) Lists (12) Machine learning (37) Maintenance (99) Migration (50) Miscellaneous (1) ▼Performance tuning (869) Alerting (8) Always On Availability Groups (82) Buffer Pool Extension (BPE) (9) Columnstore index (9) Deadlocks (16) Execution plans (125) In-Memory OLTP (22) Indexes (79) Latches (5) Locking (10) Monitoring (100) Performance (196) Performance counters (28) Performance Testing (9) Query analysis (121) Reports (20) SSAS monitoring (3) SSIS monitoring (10) SSRS monitoring (4) Wait types (11) ►Professional development (68) Professional development (27) Project management (9) SQL interview questions (32) Recovery (33) Security (84) Server management (24) SQL Azure (271) SQL Server Management Studio (SSMS) (90) SQL Server on Linux (21) ►SQL Server versions (177) SQL Server 2012 (6) SQL Server 2016 (63) SQL Server 2017 (49) SQL Server 2019 (57) SQL Server 2022 (2) ►Technologies (334) AWS (45) AWS RDS (56) Azure Cosmos DB (28) Containers (12) Docker (9) Graph database (13) Kerberos (2) Kubernetes (1) Linux (44) LocalDB (2) MySQL (49) Oracle (10) PolyBase (10) PostgreSQL (36) SharePoint (4) Ubuntu (13) Uncategorized (4) Utilities (21) Helpers and best practices BI performance counters SQL code smells rules SQL Server wait types © 2022 Quest Software Inc. ALL RIGHTS RESERVED. GDPR Terms of Use Privacy

TREND NOW

Inaccurate SQL Server statistics a SQL query performance killer the basics

SQLShack

Inaccurate SQL Server statistics – a SQL query performance killer – the basics

What are SQL Server statistics

How SQL Statistics affect SQL Server performance

Working with SQL Server statistics

Related posts

Follow us

Popular

Trending

Solutions

Categories and tips

Comments (0)

Leave a Comment