A clear explanation of merging accounts that share emails using union find and sorted email groups.
Problem Restatement
We are given a list of accounts.
Each account has this format:
[name, email1, email2, ...]Two accounts belong to the same person if they share at least one email address.
After merging accounts, each returned account should contain:
- The person’s name.
- All unique emails belonging to that person, sorted lexicographically.
The returned accounts can be in any order.
Accounts with the same name do not automatically belong to the same person. They only merge if they share an email. The official statement notes that different people may have the same name, while all accounts belonging to one person have the same name.
Input and Output
| Item | Meaning |
|---|---|
| Input | accounts, a list of account records |
| Account format | First item is name, remaining items are emails |
| Output | Merged accounts |
| Merge rule | Accounts sharing at least one email belong together |
| Email order | Emails in each merged account must be sorted |
| Account order | Can be any order |
The function shape is:
class Solution:
def accountsMerge(self, accounts: list[list[str]]) -> list[list[str]]:
...Examples
Example:
accounts = [
["John", "[email protected]", "[email protected]"],
["John", "[email protected]", "[email protected]"],
["Mary", "[email protected]"],
["John", "[email protected]"],
]The first two John accounts share:
So they are merged.
The result can be:
[
["John", "[email protected]", "[email protected]", "[email protected]"],
["Mary", "[email protected]"],
["John", "[email protected]"],
]The two separate John groups are not merged because they do not share an email.
First Thought: Compare Every Account Pair
A direct approach is to compare every pair of accounts.
For each pair, check whether their email sets intersect.
If they share an email, merge them.
This works in principle, but merging can be transitive.
For example:
A shares with B
B shares with C
A does not directly share with CAll three still belong to the same person.
Pairwise merging becomes messy because we need to maintain connected groups.
Key Insight
This is a connected components problem.
Accounts are connected if they share an email.
We can model each account as a node.
If account i and account j share an email, union them into the same component.
Union Find is a good fit because it efficiently merges groups and later lets us find the final representative for each group.
Algorithm
Use a hash map:
email_to_accountThis maps each email to the first account index where it appeared.
Use Union Find over account indices.
First pass:
- For each account index
i:- For each email in that account:
- If the email has not been seen, store
email_to_account[email] = i. - If the email has been seen before at account
j, unioniandj.
- If the email has not been seen, store
- For each email in that account:
Second pass:
- For each account index
i:- Find its root account.
- Add all emails from account
iinto the email set for that root.
Final step:
- For each root group:
- Use the name from
accounts[root][0]. - Sort the emails.
- Return
[name] + sorted_emails.
- Use the name from
Correctness
If two accounts share an email, the first pass sees that email in both accounts and unions their indices. Therefore directly connected accounts are placed in the same Union Find component.
If accounts are connected through a chain of shared emails, Union Find merges each adjacent pair in that chain. By transitivity, all accounts in the chain end up with the same root. Therefore every group belonging to the same person becomes one component.
Accounts that do not share any email directly or indirectly are never connected by a union operation, so they remain in separate components.
In the second pass, every account contributes its emails to the set associated with its component root. This collects all unique emails for that merged person.
The final result sorts each email set and prepends the correct name. Since all accounts in one component belong to the same person and the problem guarantees they share the same name, using the root account name is valid.
Therefore the algorithm returns exactly the merged accounts required by the problem.
Complexity
Let:
A = number of accounts
E = total number of email entries| Metric | Value | Why |
|---|---|---|
| Time | O(E * alpha(A) + E log E) | Union/find is near constant; sorting emails dominates within groups |
| Space | O(A + E) | Parent array, email map, and grouped email sets |
alpha(A) is the inverse Ackermann function, which is effectively constant for practical input sizes.
Implementation
from collections import defaultdict
class UnionFind:
def __init__(self, n: int):
self.parent = list(range(n))
self.size = [1] * n
def find(self, x: int) -> int:
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]]
x = self.parent[x]
return x
def union(self, a: int, b: int) -> None:
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
class Solution:
def accountsMerge(self, accounts: list[list[str]]) -> list[list[str]]:
uf = UnionFind(len(accounts))
email_to_account = {}
for i, account in enumerate(accounts):
for email in account[1:]:
if email in email_to_account:
uf.union(i, email_to_account[email])
else:
email_to_account[email] = i
groups = defaultdict(set)
for i, account in enumerate(accounts):
root = uf.find(i)
for email in account[1:]:
groups[root].add(email)
result = []
for root, emails in groups.items():
name = accounts[root][0]
result.append([name] + sorted(emails))
return resultCode Explanation
Union Find starts with each account in its own group:
self.parent = list(range(n))
self.size = [1] * nThe find method returns the root of a group and compresses paths:
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]]
x = self.parent[x]The union method connects two account groups:
uf.union(i, email_to_account[email])This happens when an email appears in more than one account.
The email map records where each email first appeared:
email_to_account[email] = iAfter all unions are done, we group emails by root:
root = uf.find(i)
groups[root].add(email)Using a set removes duplicate emails.
Finally, each group becomes one merged account:
result.append([name] + sorted(emails))The emails are sorted as required.
Alternative: Graph DFS
We can also solve this by treating emails as graph nodes.
For each account, connect all emails in that account.
Then each connected component of emails forms one merged account.
This is also correct. Union Find is often simpler here because accounts are naturally merged by shared emails.
Testing
def normalize(result):
return sorted(result, key=lambda account: (account[0], account[1:]))
def test_accounts_merge():
s = Solution()
accounts = [
["John", "[email protected]", "[email protected]"],
["John", "[email protected]", "[email protected]"],
["Mary", "[email protected]"],
["John", "[email protected]"],
]
expected = [
["John", "[email protected]", "[email protected]", "[email protected]"],
["Mary", "[email protected]"],
["John", "[email protected]"],
]
assert normalize(s.accountsMerge(accounts)) == normalize(expected)
accounts = [
["Alice", "[email protected]", "[email protected]"],
["Alice", "[email protected]", "[email protected]"],
["Alice", "[email protected]"],
]
expected = [
["Alice", "[email protected]", "[email protected]", "[email protected]"],
["Alice", "[email protected]"],
]
assert normalize(s.accountsMerge(accounts)) == normalize(expected)
accounts = [
["Bob", "[email protected]"],
["Bob", "[email protected]"],
]
expected = [
["Bob", "[email protected]"],
["Bob", "[email protected]"],
]
assert normalize(s.accountsMerge(accounts)) == normalize(expected)
print("all tests passed")
test_accounts_merge()Test coverage:
| Test | Why |
|---|---|
| Shared email merge | Confirms normal merging |
| Transitive merge | Confirms A-B-C connections merge together |
| Same name without shared email | Confirms name alone does not merge accounts |
| Sorted emails | Confirms required output formatting |