Duplicates are the cockroaches of database bugs. They sneak in silently and multiply. A user clicks Submit twice. A batch job runs twice. A migration inserts the same records again. Before you know it, the same customer has two accounts, the same order is billed twice, and the same employee gets two salary credits.
The bread and butter of duplicate detection. GROUP BY the column that should be unique, then HAVING COUNT > 1 to find violations.
-- Duplicate emails in users table
SELECT email, COUNT(*) AS occurrences
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Duplicate orders (same user, same amount, same date)
SELECT user_id, total_amount, DATE(created_at) AS order_date,
COUNT(*) AS duplicate_count
FROM orders
GROUP BY user_id, total_amount, DATE(created_at)
HAVING COUNT(*) > 1;
-- See the actual duplicate rows (not just counts)
SELECT u.*
FROM users u
INNER JOIN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(*) > 1
) dup ON u.email = dup.email
ORDER BY u.email, u.created_at;Not all duplicates are exact. "Ravi Sharma" and "ravi sharma" are the same person. "Ravi Sharma" and "Ravi Sharma" (double space) are the same person. "9876543210" and "+919876543210" are the same phone number. Fuzzy duplicate detection catches these.
-- Case-insensitive duplicate emails
SELECT LOWER(email) AS normalized_email, COUNT(*) AS occurrences
FROM users
GROUP BY LOWER(email)
HAVING COUNT(*) > 1;
-- Duplicate names after trimming whitespace
SELECT TRIM(LOWER(full_name)) AS normalized_name, COUNT(*)
FROM users
GROUP BY TRIM(LOWER(full_name))
HAVING COUNT(*) > 1;
-- Duplicate phone numbers (strip country code and spaces)
SELECT
REGEXP_REPLACE(phone, '[^0-9]', '', 'g') AS digits_only,
COUNT(*)
FROM users
GROUP BY REGEXP_REPLACE(phone, '[^0-9]', '', 'g')
HAVING COUNT(*) > 1;In financial systems, duplicate transactions are P1 bugs. If a salary credit runs twice, employees get paid double. If a debit runs twice, customers lose money. Always check for duplicates after any batch job or scheduled task.
Q: How do you detect and prevent duplicate records in a database?
A: To detect duplicates, I use GROUP BY with HAVING COUNT(*) > 1 on columns that should be unique. For fuzzy duplicates (case differences, extra spaces), I normalize the data using LOWER() and TRIM() before grouping. To prevent duplicates, I verify that UNIQUE constraints or unique indexes exist on the appropriate columns. For multi-column uniqueness (like one review per user per product), I check for composite unique constraints. I also test double-submit scenarios by clicking Submit rapidly to see if the application handles idempotency.
Key Point: Detect exact duplicates with GROUP BY HAVING. Detect fuzzy duplicates by normalizing with LOWER(), TRIM(), and regex before grouping.