# LeetCode 839: Similar String Groups

## Problem Restatement

We are given an array of strings `strs`.

Every string is an anagram of every other string in the array, and all strings have the same length.

Two strings are similar if either:

1. They are already equal.
2. We can make them equal by swapping at most two letters in one of the strings.

We need to return the number of similar string groups.

Similarity is transitive for groups. If `A` is similar to `B`, and `B` is similar to `C`, then `A`, `B`, and `C` belong to the same group, even if `A` and `C` are not directly similar.

## Input and Output

| Item | Meaning |
|---|---|
| Input | Array `strs` |
| Output | Number of similar string groups |
| Similar strings | Equal, or can become equal with one swap |
| Group rule | Connected by direct or indirect similarity |
| Constraint | All strings are anagrams of each other |

Function shape:

```python
class Solution:
    def numSimilarGroups(self, strs: list[str]) -> int:
        ...
```

## Examples

Example 1:

```python
strs = ["tars", "rats", "arts", "star"]
```

The similar connections are:

```python
"tars" <-> "rats"
"rats" <-> "arts"
```

So these three strings form one group:

```python
{"tars", "rats", "arts"}
```

The string `"star"` is not similar to any of them.

So it forms its own group:

```python
{"star"}
```

The answer is:

```python
2
```

Example 2:

```python
strs = ["omv", "ovm"]
```

The two strings differ at exactly two positions.

Swapping those two positions in `"omv"` gives `"ovm"`.

So they are in one group.

The answer is:

```python
1
```

## First Thought: Build a Graph

Treat each string as a node.

Draw an edge between two nodes if the two strings are similar.

Then the answer is the number of connected components in this graph.

For example:

```python
["tars", "rats", "arts", "star"]
```

forms:

```python
tars -- rats -- arts

star
```

There are two connected components, so the answer is `2`.

We could explicitly build the graph and run DFS, but union-find is simpler because the problem is only asking for the number of groups.

## Key Insight

Because every string is an anagram of every other string, two strings are similar exactly when they differ at either `0` or `2` positions.

Why?

If they differ at `0` positions, they are equal.

If they differ at `2` positions, one swap can fix both mismatched positions.

If they differ at more than `2` positions, one swap cannot make them equal.

So the similarity check can be:

```python
count mismatched positions
return mismatch_count <= 2
```

Since all strings are anagrams, `mismatch_count == 1` cannot happen for valid different strings.

## Algorithm

Use union-find.

1. Start with each string in its own group.
2. Compare every pair of strings.
3. If two strings are similar, union their groups.
4. Return the remaining number of groups.

Union-find maintains connected components efficiently.

## Walkthrough

Use:

```python
strs = ["tars", "rats", "arts", "star"]
```

Start with four groups:

```python
{0}, {1}, {2}, {3}
```

Compare `"tars"` and `"rats"`.

They differ at positions:

```python
0 and 2
```

They are similar, so union them:

```python
{0, 1}, {2}, {3}
```

Compare `"tars"` and `"arts"`.

They differ at more than two positions, so do nothing.

Compare `"tars"` and `"star"`.

They differ at more than two positions, so do nothing.

Compare `"rats"` and `"arts"`.

They differ at two positions, so union them:

```python
{0, 1, 2}, {3}
```

The final number of groups is:

```python
2
```

## Correctness

The algorithm connects two strings exactly when they are similar.

If two strings are equal, they differ at `0` positions, so the algorithm unions them.

If two strings can be made equal by one swap, then the swap fixes exactly two mismatched positions, so they differ at `2` positions, and the algorithm unions them.

If two strings differ at more than `2` positions, one swap can change only two positions, so they cannot be similar, and the algorithm does not union them.

After all pair comparisons, union-find contains exactly the connected components of the similarity graph.

A similar string group is exactly one connected component of that graph, because strings may belong to the same group through direct or indirect similarity.

Therefore, the number of union-find components is exactly the number of similar string groups.

## Complexity

Let:

```python
n = len(strs)
m = len(strs[0])
```

| Metric | Value | Why |
|---|---:|---|
| Time | `O(n^2 * m)` | Compare every pair of strings, each comparison scans up to `m` characters |
| Space | `O(n)` | Union-find parent and rank arrays |

Union-find operations are effectively constant time because of path compression and union by rank.

## Implementation

```python
class Solution:
    def numSimilarGroups(self, strs: list[str]) -> int:
        n = len(strs)
        parent = list(range(n))
        rank = [0] * n
        groups = n

        def find(x: int) -> int:
            if parent[x] != x:
                parent[x] = find(parent[x])
            return parent[x]

        def union(a: int, b: int) -> None:
            nonlocal groups

            root_a = find(a)
            root_b = find(b)

            if root_a == root_b:
                return

            if rank[root_a] < rank[root_b]:
                parent[root_a] = root_b
            elif rank[root_a] > rank[root_b]:
                parent[root_b] = root_a
            else:
                parent[root_b] = root_a
                rank[root_a] += 1

            groups -= 1

        def similar(a: str, b: str) -> bool:
            diff = 0

            for x, y in zip(a, b):
                if x != y:
                    diff += 1
                    if diff > 2:
                        return False

            return True

        for i in range(n):
            for j in range(i + 1, n):
                if similar(strs[i], strs[j]):
                    union(i, j)

        return groups
```

## Code Explanation

Each string starts as its own group:

```python
parent = list(range(n))
rank = [0] * n
groups = n
```

The `find` function returns the representative of a group:

```python
def find(x: int) -> int:
    if parent[x] != x:
        parent[x] = find(parent[x])
    return parent[x]
```

Path compression makes future lookups faster.

The `union` function merges two groups if they are different:

```python
if root_a == root_b:
    return
```

When two groups are merged, the group count decreases:

```python
groups -= 1
```

The similarity check counts mismatched positions:

```python
def similar(a: str, b: str) -> bool:
    diff = 0

    for x, y in zip(a, b):
        if x != y:
            diff += 1
            if diff > 2:
                return False

    return True
```

Returning `True` for `diff <= 2` works because all strings are anagrams.

Finally, every pair is checked:

```python
for i in range(n):
    for j in range(i + 1, n):
        if similar(strs[i], strs[j]):
            union(i, j)
```

The remaining `groups` value is the answer.

## Testing

```python
def run_tests():
    s = Solution()

    assert s.numSimilarGroups(["tars", "rats", "arts", "star"]) == 2
    assert s.numSimilarGroups(["omv", "ovm"]) == 1
    assert s.numSimilarGroups(["abc"]) == 1
    assert s.numSimilarGroups(["abc", "abc"]) == 1
    assert s.numSimilarGroups(["abc", "acb", "bac", "bca", "cab", "cba"]) == 1
    assert s.numSimilarGroups(["abcd", "abdc", "acbd", "acdb"]) == 1

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| `["tars", "rats", "arts", "star"]` | Standard case with two groups |
| `["omv", "ovm"]` | One swap connects both strings |
| Single string | One node forms one group |
| Duplicate strings | Equal strings are similar |
| Many anagrams | Confirms transitive grouping |
| Multiple connected swaps | Confirms indirect similarity |