A clear explanation of solving Unique Email Addresses using string normalization and a hash set.
Problem Restatement
We are given a list of email addresses.
Each email has two parts:
local-name@domain-nameThe problem defines two special rules that apply only to the local name.
Rule 1: Ignore Dots
In the local name:
"."characters are ignored.
For example:
"alice.z" == "alicez"Rule 2: Ignore Everything After ‘+’
If a plus sign appears in the local name:
"+"then everything after it is ignored.
For example:
"m.y+name" -> "my"These rules do not apply to the domain name.
We must count how many unique email addresses actually receive mail after normalization.
The official statement defines the same dot-removal and plus-ignore rules for the local name only. (leetcode.com)
Input and Output
| Item | Meaning |
|---|---|
| Input | A list of email strings |
| Output | Number of unique normalized emails |
| Dot rule | Dots in local name are ignored |
| Plus rule | Ignore everything after + in local name |
| Domain rule | Domain remains unchanged |
Function shape:
class Solution:
def numUniqueEmails(self, emails: list[str]) -> int:
...Examples
Example 1:
emails = [
"[email protected]",
"[email protected]",
"[email protected]",
]Normalize the first email.
Split into:
local = "test.email+alex"
domain = "leetcode.com"Remove everything after +:
"test.email"Remove dots:
"testemail"Final normalized email:
The second email becomes the same normalized address:
The third email has a different domain:
So there are:
2unique addresses.
Example 2:
emails = [
"[email protected]",
"[email protected]",
"[email protected]",
]All addresses are already different.
Answer:
3First Thought
The rules only change the local name.
So for every email:
- Split it into local name and domain name.
- Process the local name.
- Rebuild the normalized email.
- Store it in a set.
At the end, the size of the set is the answer.
Key Insight
A hash set automatically removes duplicates.
If two original emails normalize into the same string, the set stores only one copy.
So the whole problem becomes a normalization problem.
The normalization steps are:
- Split by
'@' - Remove everything after
'+' - Remove all dots
'.' - Combine with the original domain
Algorithm
Create an empty set:
seen = set()For each email:
Split into:
local, domain = email.split("@")Remove the plus section:
local = local.split("+")[0]Remove dots:
local = local.replace(".", "")Build the normalized email:
normalized = local + "@" + domainInsert into the set:
seen.add(normalized)Return:
len(seen)Walkthrough
Use:
email = "[email protected]"Split:
local = "test.email+alex"
domain = "leetcode.com"Remove everything after '+':
"test.email"Remove dots:
"testemail"Rebuild:
That is the normalized address.
Correctness
For each email, the algorithm applies exactly the two rules defined in the problem.
First, the local name is truncated at the first '+', so every character after '+' is ignored.
Second, all dots are removed from the remaining local name.
The domain name is preserved unchanged.
Therefore, the produced normalized string is exactly the address that receives the email according to the problem rules.
If two original emails normalize to the same receiving address, the algorithm inserts the same normalized string into the set, and the set stores only one copy.
If two emails normalize to different receiving addresses, they become different strings and both remain in the set.
So after processing all emails, the set contains exactly the unique receiving addresses.
The algorithm returns the size of that set, which is the correct answer.
Complexity
Suppose:
n = len(emails)and the average email length is:
m| Metric | Value | Why |
|---|---|---|
| Time | O(n * m) | Each email is scanned a constant number of times |
| Space | O(n * m) | The set stores normalized emails |
Implementation
class Solution:
def numUniqueEmails(self, emails: list[str]) -> int:
seen = set()
for email in emails:
local, domain = email.split("@")
local = local.split("+")[0]
local = local.replace(".", "")
normalized = local + "@" + domain
seen.add(normalized)
return len(seen)Code Explanation
We use a set to store unique normalized addresses:
seen = set()Split the email into local and domain parts:
local, domain = email.split("@")Remove everything after '+':
local = local.split("+")[0]Remove all dots:
local = local.replace(".", "")Rebuild the normalized email:
normalized = local + "@" + domainInsert into the set:
seen.add(normalized)Finally return the number of unique entries:
return len(seen)Testing
def run_tests():
s = Solution()
assert s.numUniqueEmails([
"[email protected]",
"[email protected]",
"[email protected]",
]) == 2
assert s.numUniqueEmails([
"[email protected]",
"[email protected]",
"[email protected]",
]) == 3
assert s.numUniqueEmails([
"[email protected]",
"[email protected]",
]) == 1
assert s.numUniqueEmails([
"[email protected]",
"[email protected]",
]) == 1
assert s.numUniqueEmails([
"[email protected]",
"[email protected]",
]) == 1
assert s.numUniqueEmails([
"[email protected]",
"[email protected]",
]) == 2
print("all tests passed")
run_tests()| Test | Why |
|---|---|
| Official example | Basic normalization |
| Different addresses | No duplicates |
| Dot removal | Dots ignored |
| Plus handling | Ignore suffix after + |
| Multiple plus sections | Split at first + |
| Different domains | Domain is preserved |