Skip to content

LeetCode 929: Unique Email Addresses

A clear explanation of solving Unique Email Addresses using string normalization and a hash set.

Problem Restatement

We are given a list of email addresses.

Each email has two parts:

local-name@domain-name

The problem defines two special rules that apply only to the local name.

Rule 1: Ignore Dots

In the local name:

"."

characters are ignored.

For example:

"alice.z" == "alicez"

Rule 2: Ignore Everything After ‘+’

If a plus sign appears in the local name:

"+"

then everything after it is ignored.

For example:

"m.y+name" -> "my"

These rules do not apply to the domain name.

We must count how many unique email addresses actually receive mail after normalization.

The official statement defines the same dot-removal and plus-ignore rules for the local name only. (leetcode.com)

Input and Output

ItemMeaning
InputA list of email strings
OutputNumber of unique normalized emails
Dot ruleDots in local name are ignored
Plus ruleIgnore everything after + in local name
Domain ruleDomain remains unchanged

Function shape:

class Solution:
    def numUniqueEmails(self, emails: list[str]) -> int:
        ...

Examples

Example 1:

Normalize the first email.

Split into:

local  = "test.email+alex"
domain = "leetcode.com"

Remove everything after +:

"test.email"

Remove dots:

"testemail"

Final normalized email:

The second email becomes the same normalized address:

The third email has a different domain:

So there are:

2

unique addresses.

Example 2:

All addresses are already different.

Answer:

3

First Thought

The rules only change the local name.

So for every email:

  1. Split it into local name and domain name.
  2. Process the local name.
  3. Rebuild the normalized email.
  4. Store it in a set.

At the end, the size of the set is the answer.

Key Insight

A hash set automatically removes duplicates.

If two original emails normalize into the same string, the set stores only one copy.

So the whole problem becomes a normalization problem.

The normalization steps are:

  1. Split by '@'
  2. Remove everything after '+'
  3. Remove all dots '.'
  4. Combine with the original domain

Algorithm

Create an empty set:

seen = set()

For each email:

Split into:

local, domain = email.split("@")

Remove the plus section:

local = local.split("+")[0]

Remove dots:

local = local.replace(".", "")

Build the normalized email:

normalized = local + "@" + domain

Insert into the set:

seen.add(normalized)

Return:

len(seen)

Walkthrough

Use:

Split:

local  = "test.email+alex"
domain = "leetcode.com"

Remove everything after '+':

"test.email"

Remove dots:

"testemail"

Rebuild:

That is the normalized address.

Correctness

For each email, the algorithm applies exactly the two rules defined in the problem.

First, the local name is truncated at the first '+', so every character after '+' is ignored.

Second, all dots are removed from the remaining local name.

The domain name is preserved unchanged.

Therefore, the produced normalized string is exactly the address that receives the email according to the problem rules.

If two original emails normalize to the same receiving address, the algorithm inserts the same normalized string into the set, and the set stores only one copy.

If two emails normalize to different receiving addresses, they become different strings and both remain in the set.

So after processing all emails, the set contains exactly the unique receiving addresses.

The algorithm returns the size of that set, which is the correct answer.

Complexity

Suppose:

n = len(emails)

and the average email length is:

m
MetricValueWhy
TimeO(n * m)Each email is scanned a constant number of times
SpaceO(n * m)The set stores normalized emails

Implementation

class Solution:
    def numUniqueEmails(self, emails: list[str]) -> int:
        seen = set()

        for email in emails:
            local, domain = email.split("@")

            local = local.split("+")[0]
            local = local.replace(".", "")

            normalized = local + "@" + domain

            seen.add(normalized)

        return len(seen)

Code Explanation

We use a set to store unique normalized addresses:

seen = set()

Split the email into local and domain parts:

local, domain = email.split("@")

Remove everything after '+':

local = local.split("+")[0]

Remove all dots:

local = local.replace(".", "")

Rebuild the normalized email:

normalized = local + "@" + domain

Insert into the set:

seen.add(normalized)

Finally return the number of unique entries:

return len(seen)

Testing

def run_tests():
    s = Solution()

    assert s.numUniqueEmails([
        "[email protected]",
        "[email protected]",
        "[email protected]",
    ]) == 2

    assert s.numUniqueEmails([
        "[email protected]",
        "[email protected]",
        "[email protected]",
    ]) == 3

    assert s.numUniqueEmails([
        "[email protected]",
        "[email protected]",
    ]) == 1

    assert s.numUniqueEmails([
        "[email protected]",
        "[email protected]",
    ]) == 1

    assert s.numUniqueEmails([
        "[email protected]",
        "[email protected]",
    ]) == 1

    assert s.numUniqueEmails([
        "[email protected]",
        "[email protected]",
    ]) == 2

    print("all tests passed")

run_tests()
TestWhy
Official exampleBasic normalization
Different addressesNo duplicates
Dot removalDots ignored
Plus handlingIgnore suffix after +
Multiple plus sectionsSplit at first +
Different domainsDomain is preserved