LeetCode 49: Group Anagrams

Problem Restatement

We are given an array of strings strs.

We need to group strings that are anagrams of each other.

Two strings are anagrams if they contain the same characters with the same frequencies, but possibly in a different order.

The answer can be returned in any order. LeetCode states the input strings contain lowercase English letters, with 1 <= strs.length <= 10^4 and 0 <= strs[i].length <= 100.

Example:

strs = ["eat", "tea", "tan", "ate", "nat", "bat"]

The output can be:

[
    ["bat"],
    ["nat", "tan"],
    ["ate", "eat", "tea"],
]

The order of groups does not matter.

The order inside each group also does not matter.

Input and Output

Item	Meaning
Input	A list of lowercase strings `strs`
Output	A list of groups, where each group contains anagrams
Group order	Any order is allowed
String order inside group	Any order is allowed

Function shape:

def groupAnagrams(strs: list[str]) -> list[list[str]]:
    ...

First Thought: Compare Every Pair

A direct approach is to compare every string with every other string.

To check whether two strings are anagrams, we can sort both strings and compare them.

For example:

sorted("eat") == sorted("tea")

Both become:

["a", "e", "t"]

So they are anagrams.

But comparing every pair is too slow. If there are many strings, pairwise comparison wastes work.

We need to compute one stable signature for each string and use that signature to group strings directly.

Key Insight

All anagrams share the same sorted form.

For example:

String	Sorted Signature
`"eat"`	`"aet"`
`"tea"`	`"aet"`
`"ate"`	`"aet"`
`"tan"`	`"ant"`
`"nat"`	`"ant"`
`"bat"`	`"abt"`

So we can use the sorted string as a hash map key.

The map stores:

signature -> list of strings with that signature

For example:

{
    "aet": ["eat", "tea", "ate"],
    "ant": ["tan", "nat"],
    "abt": ["bat"],
}

At the end, the values of the map are the required groups.

Algorithm

Create an empty hash map called groups.

For each word in strs:

Sort the characters in the word.
Convert the sorted characters into a string key.
Append the original word to groups[key].

Return all values from the map.

Walkthrough

Use:

strs = ["eat", "tea", "tan", "ate", "nat", "bat"]

Start:

groups = {}

Process "eat":

key = "aet"
groups = {
    "aet": ["eat"]
}

Process "tea":

key = "aet"
groups = {
    "aet": ["eat", "tea"]
}

Process "tan":

key = "ant"
groups = {
    "aet": ["eat", "tea"],
    "ant": ["tan"]
}

Process "ate":

key = "aet"
groups = {
    "aet": ["eat", "tea", "ate"],
    "ant": ["tan"]
}

Process "nat":

key = "ant"
groups = {
    "aet": ["eat", "tea", "ate"],
    "ant": ["tan", "nat"]
}

Process "bat":

key = "abt"
groups = {
    "aet": ["eat", "tea", "ate"],
    "ant": ["tan", "nat"],
    "abt": ["bat"]
}

Return:

[
    ["eat", "tea", "ate"],
    ["tan", "nat"],
    ["bat"],
]

Correctness

Two strings are anagrams exactly when they have the same characters with the same frequencies.

Sorting a string puts its characters into a fixed order. Therefore, two strings have the same sorted signature if and only if they contain the same characters with the same frequencies.

The algorithm uses this sorted signature as the hash map key. Every string is placed into the group for its signature.

So all anagrams go into the same group.

Also, strings that are not anagrams have different sorted signatures, so they cannot be placed into the same group.

Since every input string is processed once and added to exactly one group, the returned groups contain all input strings, grouped correctly.

Complexity

Let:

m = len(strs)
k = maximum length of a string

Metric	Value	Why
Time	`O(m * k log k)`	We sort each string
Space	`O(m * k)`	The groups store all strings, and keys store sorted signatures

The auxiliary map stores one list per signature.

Implementation

from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
        groups = defaultdict(list)

        for word in strs:
            key = "".join(sorted(word))
            groups[key].append(word)

        return list(groups.values())

Code Explanation

We use a defaultdict(list):

groups = defaultdict(list)

This lets us append to a group without checking whether the key already exists.

For each word:

for word in strs:

we build its sorted signature:

key = "".join(sorted(word))

Then we add the original word to the matching group:

groups[key].append(word)

We store the original word, not the sorted key, because the output must contain the original strings.

At the end, return all groups:

return list(groups.values())

Testing

Because the output order can vary, we normalize the result before comparing.

def normalize(groups):
    return sorted(sorted(group) for group in groups)

def run_tests():
    s = Solution()

    assert normalize(s.groupAnagrams(["eat", "tea", "tan", "ate", "nat", "bat"])) == normalize([
        ["bat"],
        ["nat", "tan"],
        ["ate", "eat", "tea"],
    ])

    assert normalize(s.groupAnagrams([""])) == normalize([
        [""],
    ])

    assert normalize(s.groupAnagrams(["a"])) == normalize([
        ["a"],
    ])

    assert normalize(s.groupAnagrams(["abc", "bca", "cab", "xyz", "zyx"])) == normalize([
        ["abc", "bca", "cab"],
        ["xyz", "zyx"],
    ])

    assert normalize(s.groupAnagrams(["a", "b", "c"])) == normalize([
        ["a"],
        ["b"],
        ["c"],
    ])

    print("all tests passed")

run_tests()

Test	Expected Group Count	Reason
`["eat", "tea", "tan", "ate", "nat", "bat"]`	`3`	Standard example
`[""]`	`1`	Empty string is valid
`["a"]`	`1`	Single string
`["abc", "bca", "cab", "xyz", "zyx"]`	`2`	Multiple groups
`["a", "b", "c"]`	`3`	No two strings are anagrams