A clear explanation of Group Anagrams using a hash map keyed by each word's sorted character signature.
Problem Restatement
We are given an array of strings strs.
We need to group strings that are anagrams of each other.
Two strings are anagrams if they contain the same characters with the same frequencies, but possibly in a different order.
The answer can be returned in any order. LeetCode states the input strings contain lowercase English letters, with 1 <= strs.length <= 10^4 and 0 <= strs[i].length <= 100.
Example:
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]The output can be:
[
["bat"],
["nat", "tan"],
["ate", "eat", "tea"],
]The order of groups does not matter.
The order inside each group also does not matter.
Input and Output
| Item | Meaning |
|---|---|
| Input | A list of lowercase strings strs |
| Output | A list of groups, where each group contains anagrams |
| Group order | Any order is allowed |
| String order inside group | Any order is allowed |
Function shape:
def groupAnagrams(strs: list[str]) -> list[list[str]]:
...First Thought: Compare Every Pair
A direct approach is to compare every string with every other string.
To check whether two strings are anagrams, we can sort both strings and compare them.
For example:
sorted("eat") == sorted("tea")Both become:
["a", "e", "t"]So they are anagrams.
But comparing every pair is too slow. If there are many strings, pairwise comparison wastes work.
We need to compute one stable signature for each string and use that signature to group strings directly.
Key Insight
All anagrams share the same sorted form.
For example:
| String | Sorted Signature |
|---|---|
"eat" | "aet" |
"tea" | "aet" |
"ate" | "aet" |
"tan" | "ant" |
"nat" | "ant" |
"bat" | "abt" |
So we can use the sorted string as a hash map key.
The map stores:
signature -> list of strings with that signatureFor example:
{
"aet": ["eat", "tea", "ate"],
"ant": ["tan", "nat"],
"abt": ["bat"],
}At the end, the values of the map are the required groups.
Algorithm
Create an empty hash map called groups.
For each word in strs:
- Sort the characters in the word.
- Convert the sorted characters into a string key.
- Append the original word to
groups[key].
Return all values from the map.
Walkthrough
Use:
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]Start:
groups = {}Process "eat":
key = "aet"
groups = {
"aet": ["eat"]
}Process "tea":
key = "aet"
groups = {
"aet": ["eat", "tea"]
}Process "tan":
key = "ant"
groups = {
"aet": ["eat", "tea"],
"ant": ["tan"]
}Process "ate":
key = "aet"
groups = {
"aet": ["eat", "tea", "ate"],
"ant": ["tan"]
}Process "nat":
key = "ant"
groups = {
"aet": ["eat", "tea", "ate"],
"ant": ["tan", "nat"]
}Process "bat":
key = "abt"
groups = {
"aet": ["eat", "tea", "ate"],
"ant": ["tan", "nat"],
"abt": ["bat"]
}Return:
[
["eat", "tea", "ate"],
["tan", "nat"],
["bat"],
]Correctness
Two strings are anagrams exactly when they have the same characters with the same frequencies.
Sorting a string puts its characters into a fixed order. Therefore, two strings have the same sorted signature if and only if they contain the same characters with the same frequencies.
The algorithm uses this sorted signature as the hash map key. Every string is placed into the group for its signature.
So all anagrams go into the same group.
Also, strings that are not anagrams have different sorted signatures, so they cannot be placed into the same group.
Since every input string is processed once and added to exactly one group, the returned groups contain all input strings, grouped correctly.
Complexity
Let:
m = len(strs)
k = maximum length of a string| Metric | Value | Why |
|---|---|---|
| Time | O(m * k log k) | We sort each string |
| Space | O(m * k) | The groups store all strings, and keys store sorted signatures |
The auxiliary map stores one list per signature.
Implementation
from collections import defaultdict
class Solution:
def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
groups = defaultdict(list)
for word in strs:
key = "".join(sorted(word))
groups[key].append(word)
return list(groups.values())Code Explanation
We use a defaultdict(list):
groups = defaultdict(list)This lets us append to a group without checking whether the key already exists.
For each word:
for word in strs:we build its sorted signature:
key = "".join(sorted(word))Then we add the original word to the matching group:
groups[key].append(word)We store the original word, not the sorted key, because the output must contain the original strings.
At the end, return all groups:
return list(groups.values())Testing
Because the output order can vary, we normalize the result before comparing.
def normalize(groups):
return sorted(sorted(group) for group in groups)
def run_tests():
s = Solution()
assert normalize(s.groupAnagrams(["eat", "tea", "tan", "ate", "nat", "bat"])) == normalize([
["bat"],
["nat", "tan"],
["ate", "eat", "tea"],
])
assert normalize(s.groupAnagrams([""])) == normalize([
[""],
])
assert normalize(s.groupAnagrams(["a"])) == normalize([
["a"],
])
assert normalize(s.groupAnagrams(["abc", "bca", "cab", "xyz", "zyx"])) == normalize([
["abc", "bca", "cab"],
["xyz", "zyx"],
])
assert normalize(s.groupAnagrams(["a", "b", "c"])) == normalize([
["a"],
["b"],
["c"],
])
print("all tests passed")
run_tests()| Test | Expected Group Count | Reason |
|---|---|---|
["eat", "tea", "tan", "ate", "nat", "bat"] | 3 | Standard example |
[""] | 1 | Empty string is valid |
["a"] | 1 | Single string |
["abc", "bca", "cab", "xyz", "zyx"] | 2 | Multiple groups |
["a", "b", "c"] | 3 | No two strings are anagrams |