Overview of the SQL Count Distinct Function

Overview of the SQL Count Distinct Function

Overview of the SQL Count Distinct Function

SQLShack

SQL Server training Español

Overview of the SQL Count Distinct Function

June 26, 2019 by Rajendra Gupta This article explores SQL Count Distinct operator for eliminating the duplicate rows in the result set. A developer needs to get data from a SQL table with multiple conditions. Sometimes, we want to get all rows in a table but eliminate the available NULL values. Suppose we want to get distinct customer records that have placed an order last year. Let’s go ahead and have a quick overview of SQL Count Function.

SQL Count Function

We use SQL Count aggregate function to get the number of rows in the output. Suppose we have a product table that holds records for all products sold by a company. We want to know the count of products sold during the last quarter. We can use SQL Count Function to return the number of rows in the specified condition. The syntax of the SQL COUNT function:
COUNT ([ALL DISTINCT] expression); By default, SQL Server Count Function uses All keyword. It means that SQL Server counts all records in a table. It also includes the rows having duplicate values as well. Let’s create a sample table and insert few records in it. 1234567 CREATE TABLE ##TestTable (Id int identity(1,1), Col1 char(1) NULL); INSERT INTO ##TestTable VALUES ('A'); INSERT INTO ##TestTable VALUES ('A'); INSERT INTO ##TestTable VALUES ('B'); INSERT INTO ##TestTable VALUES ('B');INSERT INTO ##TestTable VALUES (NULL); INSERT INTO ##TestTable VALUES (NULL); In this table, we have duplicate values and NULL values as well. In the following screenshot, we can note that: Count (*) includes duplicate values as well as NULL values Count (Col1) includes duplicate values but does not include NULL values Suppose we want to know the distinct values available in the table. We can use SQL COUNT DISTINCT to do so. 12 Select count(DISTINCT COL1)from ##TestTable In the following output, we get only 2 rows. SQL COUNT Distinct does not eliminate duplicate and NULL values from the result set. Let’s look at another example. In this example, we have a location table that consists of two columns City and State. 123456789 CREATE TABLE Location(City VARCHAR(30), State VARCHAR(20)); Insert into location values('Gurgaon','Haryana')Insert into location values('Gurgaon','Rajasthan')Insert into location values('Jaipur','Rajasthan')Insert into location values('Jaipur','Haryana') Now, execute the following query to find out a count of the distinct city from the table. 12 SELECT COUNT(DISTINCT(City))FROM Location; It returns the count of unique city count 2 (Gurgaon and Jaipur) from our result set. If we look at the data, we have similar city name present in a different state as well. The combination of city and state is unique, and we do not want that unique combination to be eliminated from the output. We can use SQL DISTINCT function on a combination of columns as well. It checks for the combination of values and removes if the combination is not unique. 12 SELECT DISTINCT City, StateFROM Location; It does not remove the duplicate city names from the output because of a unique combination of values. Let’s insert one more rows in the location table. 1 Insert into location values('Gurgaon','Haryana') We have 5 records in the location table. In the data, you can see we have one combination of city and state that is not unique. Rerun the SELECT DISTINCT function, and it should return only 4 rows this time. We cannot use SQL COUNT DISTINCT function directly with the multiple columns. You get the following error message. We can use a temporary table to get records from the SQL DISTINCT function and then use count(*) to check the row counts. 1234 SELECT DISTINCT City, Stateinto #TempFROM Location;Select count(*) from #Temp We get the row count 4 in the output. If we use a combination of columns to get distinct values and any of the columns contain NULL values, it also becomes a unique combination for the SQL Server. To verify this, let’s insert more records in the location table. We did not specify any state in this query. 12 Insert into location values('Gurgaon','')Insert into location(city)values('Gurgaon') Let’s look at the location table data. Re-run the query to get distinct rows from the location table. 12 SELECT distinct City, StateFROM Location; In the output, we can see it does not eliminate the combination of City and State with the blank or NULL values. Similarly, you can see row count 6 with SQL COUNT DISTINCT function.

Difference between SELECT COUNT COUNT * and SQL COUNT distinct

COUNT Count(*) Count(Distinct) It returns the total number of rows after satisfying conditions specified in the where clause. It returns the total number of rows after satisfying conditions specified in the where clause. It returns the distinct number of rows after satisfying conditions specified in the where clause. It gives the counts of rows. It does not eliminate duplicate values. It considers all rows regardless of any duplicate, NULL values. It gives a distinct number of rows after eliminating NULL and duplicate values. It eliminates the NULL values in the output. It does not eliminate the NULL values in the output. It eliminates the NULL values in the output.

Execution Plan of SQL Count distinct function

Let’s look at the Actual Execution Plan of the SQL COUNT DISTINCT function. You need to enable the Actual Execution Plan from the SSMS Menu bar as shown below. Execute the query to get an Actual execution plan. In this execution plan, you can see top resource consuming operators: Sort (Distinct Sort) – Cost 78% Table Scan – Cost 22% You can hover the mouse over the sort operator, and it opens a tool-tip with the operator details. In the properties windows, also we get more details around the sort operator including memory allocation, statistics, and the number of rows. In a table with million records, SQL Count Distinct might cause performance issues because a distinct count operator is a costly operator in the actual execution plan. SQL Server 2019 improves the performance of SQL COUNT DISTINCT operator using a new Approx_count_distinct function. This new function of SQL Server 2019 provides an approximate distinct count of the rows. There might be a slight difference in the SQL Count distinct and Approx_Count_distinct function output. You can replace SQL COUNT DISTINCT with the keyword Approx_Count_distinct to use this function from SQL Server 2019. 12 SELECT APPROX_COUNT_DISTINCT(City)FROM Location; You can explore more on this function in The new SQL Server 2019 function Approx_Count_Distinct.

Conclusion br

In this article, we explored the SQL COUNT Function with various examples. We also covered new SQL function Approx_Count_distinct available from SQL Server 2019. I would suggest reviewing them as per your environment. If you have any comments or questions, feel free to leave them in the comments below. Author Recent Posts Rajendra GuptaHi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.

I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.

I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.

Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.

Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at [email protected]

View all posts by Rajendra Gupta Latest posts by Rajendra Gupta (see all) Copy data from AWS RDS SQL Server to Azure SQL Database - October 21, 2022 Rename on-premises SQL Server database and Azure SQL database - October 18, 2022 SQL Commands to check current Date and Time (Timestamp) in SQL Server - October 7, 2022

Related posts

What is causing database slowdowns? The Table Variable in SQL Server How to use Window functions in SQL Server Different ways to SQL delete duplicate rows from a SQL Table Term Extraction Transformation in SSIS 210,925 Views

Follow us

Popular

SQL Convert Date functions and formats SQL Variables: Basics and usage SQL PARTITION BY Clause overview Different ways to SQL delete duplicate rows from a SQL Table How to UPDATE from a SELECT statement in SQL Server SQL Server functions for converting a String to a Date SELECT INTO TEMP TABLE statement in SQL Server SQL WHILE loop with simple examples How to backup and restore MySQL databases using the mysqldump command CASE statement in SQL Overview of SQL RANK functions Understanding the SQL MERGE statement INSERT INTO SELECT statement overview and examples SQL multiple joins for beginners with examples Understanding the SQL Decimal data type DELETE CASCADE and UPDATE CASCADE in SQL Server foreign key SQL Not Equal Operator introduction and examples SQL CROSS JOIN with examples The Table Variable in SQL Server SQL Server table hints – WITH (NOLOCK) best practices

Trending

SQL Server Transaction Log Backup, Truncate and Shrink Operations Six different methods to copy tables between databases in SQL Server How to implement error handling in SQL Server Working with the SQL Server command line (sqlcmd) Methods to avoid the SQL divide by zero error Query optimization techniques in SQL Server: tips and tricks How to create and configure a linked server in SQL Server Management Studio SQL replace: How to replace ASCII special characters in SQL Server How to identify slow running queries in SQL Server SQL varchar data type deep dive How to implement array-like functionality in SQL Server All about locking in SQL Server SQL Server stored procedures for beginners Database table partitioning in SQL Server How to drop temp tables in SQL Server How to determine free space and file size for SQL Server databases Using PowerShell to split a string into an array KILL SPID command in SQL Server How to install SQL Server Express edition SQL Union overview, usage and examples

Solutions

Read a SQL Server transaction logSQL Server database auditing techniquesHow to recover SQL Server data from accidental UPDATE and DELETE operationsHow to quickly search for SQL database data and objectsSynchronize SQL Server databases in different remote sourcesRecover SQL data from a dropped table without backupsHow to restore specific table(s) from a SQL Server database backupRecover deleted SQL data from transaction logsHow to recover SQL Server data from accidental updates without backupsAutomatically compare and synchronize SQL Server dataOpen LDF file and view LDF file contentQuickly convert SQL code to language-specific client codeHow to recover a single table from a SQL Server database backupRecover data lost due to a TRUNCATE operation without backupsHow to recover SQL Server data from accidental DELETE, TRUNCATE and DROP operationsReverting your SQL Server database back to a specific point in timeHow to create SSIS package documentationMigrate a SQL Server database to a newer version of SQL ServerHow to restore a SQL Server database backup to an older version of SQL Server

Categories and tips

►Auditing and compliance (50) Auditing (40) Data classification (1) Data masking (9) Azure (295) Azure Data Studio (46) Backup and restore (108) ►Business Intelligence (482) Analysis Services (SSAS) (47) Biml (10) Data Mining (14) Data Quality Services (4) Data Tools (SSDT) (13) Data Warehouse (16) Excel (20) General (39) Integration Services (SSIS) (125) Master Data Services (6) OLAP cube (15) PowerBI (95) Reporting Services (SSRS) (67) Data science (21) ►Database design (233) Clustering (16) Common Table Expressions (CTE) (11) Concurrency (1) Constraints (8) Data types (11) FILESTREAM (22) General database design (104) Partitioning (13) Relationships and dependencies (12) Temporal tables (12) Views (16) ▼Database development (418) Comparison (4) Continuous delivery (CD) (5) Continuous integration (CI) (11) Development (146) Functions (106) Hyper-V (1) Search (10) Source Control (15) SQL unit testing (23) Stored procedures (34) String Concatenation (2) Synonyms (1) Team Explorer (2) Testing (35) Visual Studio (14) DBAtools (35) DevOps (23) DevSecOps (2) Documentation (22) ETL (76) ►Features (213) Adaptive query processing (11) Bulk insert (16) Database mail (10) DBCC (7) Experimentation Assistant (DEA) (3) High Availability (36) Query store (10) Replication (40) Transaction log (59) Transparent Data Encryption (TDE) (21) Importing, exporting (51) Installation, setup and configuration (121) Jobs (42) ▼Languages and coding (686) Cursors (9) DDL (9) DML (6) JSON (17) PowerShell (77) Python (37) R (16) SQL commands (196) SQLCMD (7) String functions (21) T-SQL (275) XML (15) Lists (12) Machine learning (37) Maintenance (99) Migration (50) Miscellaneous (1) ►Performance tuning (869) Alerting (8) Always On Availability Groups (82) Buffer Pool Extension (BPE) (9) Columnstore index (9) Deadlocks (16) Execution plans (125) In-Memory OLTP (22) Indexes (79) Latches (5) Locking (10) Monitoring (100) Performance (196) Performance counters (28) Performance Testing (9) Query analysis (121) Reports (20) SSAS monitoring (3) SSIS monitoring (10) SSRS monitoring (4) Wait types (11) ►Professional development (68) Professional development (27) Project management (9) SQL interview questions (32) Recovery (33) Security (84) Server management (24) SQL Azure (271) SQL Server Management Studio (SSMS) (90) SQL Server on Linux (21) ►SQL Server versions (177) SQL Server 2012 (6) SQL Server 2016 (63) SQL Server 2017 (49) SQL Server 2019 (57) SQL Server 2022 (2) ►Technologies (334) AWS (45) AWS RDS (56) Azure Cosmos DB (28) Containers (12) Docker (9) Graph database (13) Kerberos (2) Kubernetes (1) Linux (44) LocalDB (2) MySQL (49) Oracle (10) PolyBase (10) PostgreSQL (36) SharePoint (4) Ubuntu (13) Uncategorized (4) Utilities (21) Helpers and best practices BI performance counters SQL code smells rules SQL Server wait types © 2022 Quest Software Inc. ALL RIGHTS RESERVED. GDPR Terms of Use Privacy
Share:
0 comments

Comments (0)

Leave a Comment

Minimum 10 characters required

* All fields are required. Comments are moderated before appearing.

No comments yet. Be the first to comment!