How to find the duplicate data in a table in SQL?

  • How-Tos FAQs
  • December 16, 2018
Supercharge Your Snowflake SQL
with Datameer's
Data Transformation

Let us set up a ‘customers’ table with their email addresses:

CREATE TABLE customers (
    customers_id INT,
    customer_name VARCHAR(100),
    email VARCHAR(100)
);

INSERT INTO  customers
SELECT 1, 'Jack', 'jack@email.com' UNION ALL
SELECT 2, 'July', 'jack@email.com' UNION ALL
SELECT 3, 'John', 'john@email.com' UNION ALL
SELECT 4, 'Rose', 'rose@email.com';

We can find duplicate data using different approaches.

Using GROUP BY

We can group the table by email column and count the rows with the same email using the HAVING clause.

SELECT   
    email ,
    COUNT(1) email_count
FROM customers
GROUP BY email
HAVING COUNT(1) > 1;



#Output
# email       email_count
------------------------------------------------------
jack@email.com   2

Using Window function

We can use the ROW_NUMBER function and partition the data by email:

SELECT customer_name
, email
, IF( ROW_NUMBER () OVER (PARTITION BY email ORDER BY email) > 1,'Yes','No') is_duplicate
FROM customers

#Output



# customer_name   email     is_duplicate
--------------------------------------------
Jack          jack@email.com   No
July          jack@email.com   Yes
John          john@email.com  No
Rose          rose@email.com  No

Up Next:

Read How do I calculate ratios using the data in two columns to create a ratio in Tableau?