A clear explanation of the Similar String Groups problem using graph connectivity and union-find.
Problem Restatement
We are given an array of strings strs.
Every string is an anagram of every other string in the array, and all strings have the same length.
Two strings are similar if either:
- They are already equal.
- We can make them equal by swapping at most two letters in one of the strings.
We need to return the number of similar string groups.
Similarity is transitive for groups. If A is similar to B, and B is similar to C, then A, B, and C belong to the same group, even if A and C are not directly similar.
Input and Output
| Item | Meaning |
|---|---|
| Input | Array strs |
| Output | Number of similar string groups |
| Similar strings | Equal, or can become equal with one swap |
| Group rule | Connected by direct or indirect similarity |
| Constraint | All strings are anagrams of each other |
Function shape:
class Solution:
def numSimilarGroups(self, strs: list[str]) -> int:
...Examples
Example 1:
strs = ["tars", "rats", "arts", "star"]The similar connections are:
"tars" <-> "rats"
"rats" <-> "arts"So these three strings form one group:
{"tars", "rats", "arts"}The string "star" is not similar to any of them.
So it forms its own group:
{"star"}The answer is:
2Example 2:
strs = ["omv", "ovm"]The two strings differ at exactly two positions.
Swapping those two positions in "omv" gives "ovm".
So they are in one group.
The answer is:
1First Thought: Build a Graph
Treat each string as a node.
Draw an edge between two nodes if the two strings are similar.
Then the answer is the number of connected components in this graph.
For example:
["tars", "rats", "arts", "star"]forms:
tars -- rats -- arts
starThere are two connected components, so the answer is 2.
We could explicitly build the graph and run DFS, but union-find is simpler because the problem is only asking for the number of groups.
Key Insight
Because every string is an anagram of every other string, two strings are similar exactly when they differ at either 0 or 2 positions.
Why?
If they differ at 0 positions, they are equal.
If they differ at 2 positions, one swap can fix both mismatched positions.
If they differ at more than 2 positions, one swap cannot make them equal.
So the similarity check can be:
count mismatched positions
return mismatch_count <= 2Since all strings are anagrams, mismatch_count == 1 cannot happen for valid different strings.
Algorithm
Use union-find.
- Start with each string in its own group.
- Compare every pair of strings.
- If two strings are similar, union their groups.
- Return the remaining number of groups.
Union-find maintains connected components efficiently.
Walkthrough
Use:
strs = ["tars", "rats", "arts", "star"]Start with four groups:
{0}, {1}, {2}, {3}Compare "tars" and "rats".
They differ at positions:
0 and 2They are similar, so union them:
{0, 1}, {2}, {3}Compare "tars" and "arts".
They differ at more than two positions, so do nothing.
Compare "tars" and "star".
They differ at more than two positions, so do nothing.
Compare "rats" and "arts".
They differ at two positions, so union them:
{0, 1, 2}, {3}The final number of groups is:
2Correctness
The algorithm connects two strings exactly when they are similar.
If two strings are equal, they differ at 0 positions, so the algorithm unions them.
If two strings can be made equal by one swap, then the swap fixes exactly two mismatched positions, so they differ at 2 positions, and the algorithm unions them.
If two strings differ at more than 2 positions, one swap can change only two positions, so they cannot be similar, and the algorithm does not union them.
After all pair comparisons, union-find contains exactly the connected components of the similarity graph.
A similar string group is exactly one connected component of that graph, because strings may belong to the same group through direct or indirect similarity.
Therefore, the number of union-find components is exactly the number of similar string groups.
Complexity
Let:
n = len(strs)
m = len(strs[0])| Metric | Value | Why |
|---|---|---|
| Time | O(n^2 * m) | Compare every pair of strings, each comparison scans up to m characters |
| Space | O(n) | Union-find parent and rank arrays |
Union-find operations are effectively constant time because of path compression and union by rank.
Implementation
class Solution:
def numSimilarGroups(self, strs: list[str]) -> int:
n = len(strs)
parent = list(range(n))
rank = [0] * n
groups = n
def find(x: int) -> int:
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a: int, b: int) -> None:
nonlocal groups
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
groups -= 1
def similar(a: str, b: str) -> bool:
diff = 0
for x, y in zip(a, b):
if x != y:
diff += 1
if diff > 2:
return False
return True
for i in range(n):
for j in range(i + 1, n):
if similar(strs[i], strs[j]):
union(i, j)
return groupsCode Explanation
Each string starts as its own group:
parent = list(range(n))
rank = [0] * n
groups = nThe find function returns the representative of a group:
def find(x: int) -> int:
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]Path compression makes future lookups faster.
The union function merges two groups if they are different:
if root_a == root_b:
returnWhen two groups are merged, the group count decreases:
groups -= 1The similarity check counts mismatched positions:
def similar(a: str, b: str) -> bool:
diff = 0
for x, y in zip(a, b):
if x != y:
diff += 1
if diff > 2:
return False
return TrueReturning True for diff <= 2 works because all strings are anagrams.
Finally, every pair is checked:
for i in range(n):
for j in range(i + 1, n):
if similar(strs[i], strs[j]):
union(i, j)The remaining groups value is the answer.
Testing
def run_tests():
s = Solution()
assert s.numSimilarGroups(["tars", "rats", "arts", "star"]) == 2
assert s.numSimilarGroups(["omv", "ovm"]) == 1
assert s.numSimilarGroups(["abc"]) == 1
assert s.numSimilarGroups(["abc", "abc"]) == 1
assert s.numSimilarGroups(["abc", "acb", "bac", "bca", "cab", "cba"]) == 1
assert s.numSimilarGroups(["abcd", "abdc", "acbd", "acdb"]) == 1
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
["tars", "rats", "arts", "star"] | Standard case with two groups |
["omv", "ovm"] | One swap connects both strings |
| Single string | One node forms one group |
| Duplicate strings | Equal strings are similar |
| Many anagrams | Confirms transitive grouping |
| Multiple connected swaps | Confirms indirect similarity |