A clear SQL solution for reporting email values that appear more than once in the Person table.
Problem Restatement
We are given a table named Person.
| Column | Type |
|---|---|
| id | int |
| varchar |
The id column is the primary key.
Each row stores one email address. The email field is guaranteed to be non-null.
We need to report all duplicate emails. A duplicate email means an email value that appears more than once in the table. The result can be returned in any order. The official statement asks us to report all duplicate emails from Person.
Input and Output
| Item | Meaning |
|---|---|
| Input | Table Person(id, email) |
| Output | One column named Email |
| Rule | Return email values that appear more than once |
| Order | Any order is accepted |
Expected output column:
EmailExamples
Input:
Person
+----+---------+
| id | email |
+----+---------+
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
+----+---------+The email [email protected] appears two times.
The email [email protected] appears one time.
So only [email protected] is a duplicate.
Output:
+---------+
| Email |
+---------+
| [email protected] |
+---------+First Thought: Count Each Email
The problem asks for email values that appear more than once.
So we should group rows by email and count how many rows each email has.
For the example table:
[email protected] -> 2
[email protected] -> 1Then we keep only groups where the count is greater than 1.
Key Insight
WHERE filters individual rows before grouping.
HAVING filters groups after grouping.
Since we need to filter by COUNT(*), we must use HAVING.
The core SQL shape is:
GROUP BY email
HAVING COUNT(*) > 1This produces one row per duplicate email.
Algorithm
- Read rows from
Person. - Group rows by
email. - Count how many rows are in each email group.
- Keep only groups where the count is greater than
1. - Return the email value as
Email.
Correctness
After GROUP BY email, each group contains all rows with the same email address.
For any email e, COUNT(*) gives the number of rows in Person whose email is e.
The condition:
HAVING COUNT(*) > 1keeps exactly the email groups that appear at least twice.
Therefore, every returned email is duplicate, and every duplicate email is returned.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) average | The database scans rows and groups by email |
| Space | O(k) | The database stores one group per distinct email |
Here, n is the number of rows, and k is the number of distinct email values.
Implementation
SELECT
email AS Email
FROM Person
GROUP BY email
HAVING COUNT(*) > 1;Code Explanation
This reads the table:
FROM PersonThis groups all rows with the same email:
GROUP BY emailThis keeps only email groups with more than one row:
HAVING COUNT(*) > 1Finally, this returns the email under the expected output name:
SELECT
email AS EmailAlternative Implementation
We can also solve this with a self join.
SELECT DISTINCT
p1.email AS Email
FROM Person p1
JOIN Person p2
ON p1.email = p2.email
AND p1.id <> p2.id;This finds pairs of different rows with the same email.
DISTINCT is needed because one duplicated email can create many matching pairs.
For this problem, GROUP BY with HAVING is simpler and closer to the requirement.
Testing
Test case 1:
CREATE TABLE Person (
id INT PRIMARY KEY,
email VARCHAR(255)
);
INSERT INTO Person (id, email) VALUES
(1, '[email protected]'),
(2, '[email protected]'),
(3, '[email protected]');
SELECT
email AS Email
FROM Person
GROUP BY email
HAVING COUNT(*) > 1;Expected:
+---------+
| Email |
+---------+
| [email protected] |
+---------+Test case 2:
DELETE FROM Person;
INSERT INTO Person (id, email) VALUES
(1, '[email protected]'),
(2, '[email protected]');Expected:
+-------+
| Email |
+-------+No email appears more than once.
Test case 3:
DELETE FROM Person;
INSERT INTO Person (id, email) VALUES
(1, '[email protected]'),
(2, '[email protected]'),
(3, '[email protected]');Expected:
+---------+
| Email |
+---------+
| [email protected] |
+---------+Even if an email appears three times, it should appear once in the result.
Test case 4:
DELETE FROM Person;
INSERT INTO Person (id, email) VALUES
(1, '[email protected]'),
(2, '[email protected]'),
(3, '[email protected]'),
(4, '[email protected]'),
(5, '[email protected]');Expected:
+---------+
| Email |
+---------+
| [email protected] |
| [email protected] |
+---------+