How to find the duplicate data in a table in SQL?
- How-Tos FAQs
- December 16, 2018
Let us set up a ‘customers’ table with their email addresses:
CREATE TABLE customers (
customers_id INT,
customer_name VARCHAR(100),
email VARCHAR(100)
);
INSERT INTO customers
SELECT 1, 'Jack', 'jack@email.com' UNION ALL
SELECT 2, 'July', 'jack@email.com' UNION ALL
SELECT 3, 'John', 'john@email.com' UNION ALL
SELECT 4, 'Rose', 'rose@email.com';
We can find duplicate data using different approaches.
Using GROUP BY
We can group the table by email column and count the rows with the same email using the HAVING clause.
SELECT
email ,
COUNT(1) email_count
FROM customers
GROUP BY email
HAVING COUNT(1) > 1;
#Output
# email email_count
------------------------------------------------------
jack@email.com 2
Using Window function
We can use the ROW_NUMBER function and partition the data by email:
SELECT customer_name
, email
, IF( ROW_NUMBER () OVER (PARTITION BY email ORDER BY email) > 1,'Yes','No') is_duplicate
FROM customers
#Output
# customer_name email is_duplicate
--------------------------------------------
Jack jack@email.com No
July jack@email.com Yes
John john@email.com No
Rose rose@email.com No
Up Next:
Read How do I calculate ratios using the data in two columns to create a ratio in Tableau?