Check sentence similarity with transitive word relationships using union-find.
Problem Restatement
We are given two sentences represented as arrays of words:
sentence1
sentence2We are also given a list of similar word pairs:
similarPairsEach pair:
[x, y]means word x is similar to word y.
Two sentences are similar if:
- They have the same number of words.
- For every index
i,sentence1[i]is similar tosentence2[i].
A word is always similar to itself.
Similarity is symmetric and transitive. If a is similar to b, and b is similar to c, then a is similar to c. This is the main difference from LeetCode 734.
Input and Output
| Item | Meaning |
|---|---|
| Input | sentence1, sentence2, and similarPairs |
| Output | True if the sentences are similar, otherwise False |
| Same word | Always considered similar |
| Symmetric | If x is similar to y, then y is similar to x |
| Transitive | If x ~ y and y ~ z, then x ~ z |
Example function shape:
def areSentencesSimilarTwo(
sentence1: list[str],
sentence2: list[str],
similarPairs: list[list[str]],
) -> bool:
...Examples
Example 1
sentence1 = ["great", "acting", "skills"]
sentence2 = ["fine", "drama", "talent"]
similarPairs = [
["great", "good"],
["fine", "good"],
["drama", "acting"],
["skills", "talent"],
]"great" is similar to "fine" because:
great -> good -> fineThe other aligned pairs are also similar.
So the answer is:
TrueExample 2
sentence1 = ["I", "love", "leetcode"]
sentence2 = ["I", "love", "onepiece"]
similarPairs = [
["manga", "onepiece"],
["platform", "anime"],
["leetcode", "platform"],
["anime", "manga"],
]The first two words are equal.
For the last word:
leetcode -> platform -> anime -> manga -> onepieceSo "leetcode" is similar to "onepiece" by transitivity.
The answer is:
TrueExample 3
sentence1 = ["great"]
sentence2 = ["doubleplus", "good"]
similarPairs = [
["great", "good"],
]The sentence lengths are different.
So the answer is:
FalseFirst Thought: Build a Graph
Each word is a node.
Each similar pair is an undirected edge.
Then two words are similar if they are in the same connected component.
We could run DFS or BFS for every aligned pair. That works, but it may repeat the same graph search many times.
A better fit is union-find.
Key Insight
Union-find keeps connected components.
For every similar pair (a, b), we merge the components of a and b.
After all pairs are processed, two words are similar exactly when they have the same root.
So the problem becomes:
- Union every similar pair.
- Compare every aligned sentence word pair.
- If the words are equal, continue.
- Otherwise, check whether they have the same union-find root.
Algorithm
Create a union-find structure over strings.
For each pair [a, b] in similarPairs:
- Add
aandbif they are not already present. - Union their components.
Then check the two sentences:
- If their lengths differ, return
False. - For every aligned pair
(a, b):- If
a == b, continue. - If either word is missing from union-find, return
False. - If
find(a) != find(b), returnFalse.
- If
- Return
True.
Correctness
Each similar pair says that two words belong to the same similarity group. The union operation merges exactly those two groups. Since union is repeated for every pair, all words connected by a chain of similar pairs end up in the same component.
Because similarity is transitive, every word in the same connected component is similar to every other word in that component. Because similarity is symmetric, the edges can be treated as undirected.
When checking the sentences, equal words are always similar. For different words, the algorithm accepts them only when their union-find roots are equal. That means they belong to the same connected component, so they are similar by the transitive rule.
If any aligned pair has different roots, there is no chain of similar pairs connecting them, so the sentences are not similar.
Therefore the algorithm returns True exactly when the two sentences satisfy the problem definition.
Complexity
Let p = len(similarPairs) and n = len(sentence1).
| Metric | Value | Why |
|---|---|---|
| Time | O((p + n) * α(w)) | Union and find are almost constant time |
| Space | O(w) | Store parent pointers for distinct words |
Here w is the number of distinct words in similarPairs, and α is the inverse Ackermann function.
For practical input sizes, union-find operations are effectively constant time.
Implementation
class Solution:
def areSentencesSimilarTwo(
self,
sentence1: list[str],
sentence2: list[str],
similarPairs: list[list[str]],
) -> bool:
if len(sentence1) != len(sentence2):
return False
parent = {}
def find(x: str) -> str:
if x not in parent:
parent[x] = x
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
def union(a: str, b: str) -> None:
root_a = find(a)
root_b = find(b)
if root_a != root_b:
parent[root_b] = root_a
for a, b in similarPairs:
union(a, b)
for a, b in zip(sentence1, sentence2):
if a == b:
continue
if a not in parent or b not in parent:
return False
if find(a) != find(b):
return False
return TrueCode Explanation
We first reject sentences of different lengths:
if len(sentence1) != len(sentence2):
return FalseThe parent dictionary stores the union-find forest:
parent = {}The find function creates a new singleton set when needed:
if x not in parent:
parent[x] = xThen it follows parent links until it reaches the component root.
Path compression makes future lookups faster:
if parent[x] != x:
parent[x] = find(parent[x])The union function connects two components:
if root_a != root_b:
parent[root_b] = root_aAfter processing all similar pairs, every connected group has one representative root.
For each aligned word pair, equal words pass immediately:
if a == b:
continueIf two different words do not appear in the similarity structure, no pair chain can connect them:
if a not in parent or b not in parent:
return FalseOtherwise, they must share the same root:
if find(a) != find(b):
return FalseTesting
def run_tests():
s = Solution()
assert s.areSentencesSimilarTwo(
["great", "acting", "skills"],
["fine", "drama", "talent"],
[
["great", "good"],
["fine", "good"],
["drama", "acting"],
["skills", "talent"],
],
) is True
assert s.areSentencesSimilarTwo(
["I", "love", "leetcode"],
["I", "love", "onepiece"],
[
["manga", "onepiece"],
["platform", "anime"],
["leetcode", "platform"],
["anime", "manga"],
],
) is True
assert s.areSentencesSimilarTwo(
["great"],
["doubleplus", "good"],
[["great", "good"]],
) is False
assert s.areSentencesSimilarTwo(
["great"],
["great"],
[],
) is True
assert s.areSentencesSimilarTwo(
["great"],
["fine"],
[["great", "good"], ["good", "fine"]],
) is True
assert s.areSentencesSimilarTwo(
["great"],
["fine"],
[["great", "good"], ["bad", "fine"]],
) is False
print("all tests passed")
run_tests()| Test | Why |
|---|---|
Similar through "good" | Checks transitive similarity |
Longer chain to "onepiece" | Checks multi-step connectivity |
| Different sentence lengths | Must fail immediately |
| Same word with no pairs | Self-similarity |
| Two-edge chain | Confirms transitivity |
| Separate components | Must fail |