Skip to content

LeetCode 839: Similar String Groups

A clear explanation of the Similar String Groups problem using graph connectivity and union-find.

Problem Restatement

We are given an array of strings strs.

Every string is an anagram of every other string in the array, and all strings have the same length.

Two strings are similar if either:

  1. They are already equal.
  2. We can make them equal by swapping at most two letters in one of the strings.

We need to return the number of similar string groups.

Similarity is transitive for groups. If A is similar to B, and B is similar to C, then A, B, and C belong to the same group, even if A and C are not directly similar.

Input and Output

ItemMeaning
InputArray strs
OutputNumber of similar string groups
Similar stringsEqual, or can become equal with one swap
Group ruleConnected by direct or indirect similarity
ConstraintAll strings are anagrams of each other

Function shape:

class Solution:
    def numSimilarGroups(self, strs: list[str]) -> int:
        ...

Examples

Example 1:

strs = ["tars", "rats", "arts", "star"]

The similar connections are:

"tars" <-> "rats"
"rats" <-> "arts"

So these three strings form one group:

{"tars", "rats", "arts"}

The string "star" is not similar to any of them.

So it forms its own group:

{"star"}

The answer is:

2

Example 2:

strs = ["omv", "ovm"]

The two strings differ at exactly two positions.

Swapping those two positions in "omv" gives "ovm".

So they are in one group.

The answer is:

1

First Thought: Build a Graph

Treat each string as a node.

Draw an edge between two nodes if the two strings are similar.

Then the answer is the number of connected components in this graph.

For example:

["tars", "rats", "arts", "star"]

forms:

tars -- rats -- arts

star

There are two connected components, so the answer is 2.

We could explicitly build the graph and run DFS, but union-find is simpler because the problem is only asking for the number of groups.

Key Insight

Because every string is an anagram of every other string, two strings are similar exactly when they differ at either 0 or 2 positions.

Why?

If they differ at 0 positions, they are equal.

If they differ at 2 positions, one swap can fix both mismatched positions.

If they differ at more than 2 positions, one swap cannot make them equal.

So the similarity check can be:

count mismatched positions
return mismatch_count <= 2

Since all strings are anagrams, mismatch_count == 1 cannot happen for valid different strings.

Algorithm

Use union-find.

  1. Start with each string in its own group.
  2. Compare every pair of strings.
  3. If two strings are similar, union their groups.
  4. Return the remaining number of groups.

Union-find maintains connected components efficiently.

Walkthrough

Use:

strs = ["tars", "rats", "arts", "star"]

Start with four groups:

{0}, {1}, {2}, {3}

Compare "tars" and "rats".

They differ at positions:

0 and 2

They are similar, so union them:

{0, 1}, {2}, {3}

Compare "tars" and "arts".

They differ at more than two positions, so do nothing.

Compare "tars" and "star".

They differ at more than two positions, so do nothing.

Compare "rats" and "arts".

They differ at two positions, so union them:

{0, 1, 2}, {3}

The final number of groups is:

2

Correctness

The algorithm connects two strings exactly when they are similar.

If two strings are equal, they differ at 0 positions, so the algorithm unions them.

If two strings can be made equal by one swap, then the swap fixes exactly two mismatched positions, so they differ at 2 positions, and the algorithm unions them.

If two strings differ at more than 2 positions, one swap can change only two positions, so they cannot be similar, and the algorithm does not union them.

After all pair comparisons, union-find contains exactly the connected components of the similarity graph.

A similar string group is exactly one connected component of that graph, because strings may belong to the same group through direct or indirect similarity.

Therefore, the number of union-find components is exactly the number of similar string groups.

Complexity

Let:

n = len(strs)
m = len(strs[0])
MetricValueWhy
TimeO(n^2 * m)Compare every pair of strings, each comparison scans up to m characters
SpaceO(n)Union-find parent and rank arrays

Union-find operations are effectively constant time because of path compression and union by rank.

Implementation

class Solution:
    def numSimilarGroups(self, strs: list[str]) -> int:
        n = len(strs)
        parent = list(range(n))
        rank = [0] * n
        groups = n

        def find(x: int) -> int:
            if parent[x] != x:
                parent[x] = find(parent[x])
            return parent[x]

        def union(a: int, b: int) -> None:
            nonlocal groups

            root_a = find(a)
            root_b = find(b)

            if root_a == root_b:
                return

            if rank[root_a] < rank[root_b]:
                parent[root_a] = root_b
            elif rank[root_a] > rank[root_b]:
                parent[root_b] = root_a
            else:
                parent[root_b] = root_a
                rank[root_a] += 1

            groups -= 1

        def similar(a: str, b: str) -> bool:
            diff = 0

            for x, y in zip(a, b):
                if x != y:
                    diff += 1
                    if diff > 2:
                        return False

            return True

        for i in range(n):
            for j in range(i + 1, n):
                if similar(strs[i], strs[j]):
                    union(i, j)

        return groups

Code Explanation

Each string starts as its own group:

parent = list(range(n))
rank = [0] * n
groups = n

The find function returns the representative of a group:

def find(x: int) -> int:
    if parent[x] != x:
        parent[x] = find(parent[x])
    return parent[x]

Path compression makes future lookups faster.

The union function merges two groups if they are different:

if root_a == root_b:
    return

When two groups are merged, the group count decreases:

groups -= 1

The similarity check counts mismatched positions:

def similar(a: str, b: str) -> bool:
    diff = 0

    for x, y in zip(a, b):
        if x != y:
            diff += 1
            if diff > 2:
                return False

    return True

Returning True for diff <= 2 works because all strings are anagrams.

Finally, every pair is checked:

for i in range(n):
    for j in range(i + 1, n):
        if similar(strs[i], strs[j]):
            union(i, j)

The remaining groups value is the answer.

Testing

def run_tests():
    s = Solution()

    assert s.numSimilarGroups(["tars", "rats", "arts", "star"]) == 2
    assert s.numSimilarGroups(["omv", "ovm"]) == 1
    assert s.numSimilarGroups(["abc"]) == 1
    assert s.numSimilarGroups(["abc", "abc"]) == 1
    assert s.numSimilarGroups(["abc", "acb", "bac", "bca", "cab", "cba"]) == 1
    assert s.numSimilarGroups(["abcd", "abdc", "acbd", "acdb"]) == 1

    print("all tests passed")

run_tests()

Test meaning:

TestWhy
["tars", "rats", "arts", "star"]Standard case with two groups
["omv", "ovm"]One swap connects both strings
Single stringOne node forms one group
Duplicate stringsEqual strings are similar
Many anagramsConfirms transitive grouping
Multiple connected swapsConfirms indirect similarity