A clear explanation of finding universal words by merging character frequency requirements from words2.
Problem Restatement
We are given two string arrays:
words1
words2A string b is a subset of string a if every letter in b appears in a with at least the same frequency.
For example:
"wrr"is a subset of:
"warrior"because "warrior" has at least one "w" and at least two "r" characters.
But "wrr" is not a subset of:
"world"because "world" has only one "r".
A word from words1 is universal if every word in words2 is a subset of it.
Return all universal words from words1.
The answer can be returned in any order. This matches the official problem statement.
Input and Output
| Item | Meaning |
|---|---|
| Input | Two string arrays words1 and words2 |
| Output | All universal words from words1 |
| Subset rule | Letter multiplicity matters |
| Order | Output order does not matter |
Function shape:
def wordSubsets(words1: list[str], words2: list[str]) -> list[str]:
...Examples
Example 1:
words1 = ["amazon", "apple", "facebook", "google", "leetcode"]
words2 = ["e", "o"]A universal word must contain:
at least one "e"
at least one "o"The universal words are:
["facebook", "google", "leetcode"]Example 2:
words1 = ["amazon", "apple", "facebook", "google", "leetcode"]
words2 = ["l", "e"]A universal word must contain:
at least one "l"
at least one "e"The universal words are:
["apple", "google", "leetcode"]Example 3:
words1 = ["amazon", "apple", "facebook", "google", "leetcode"]
words2 = ["lo", "eo"]A universal word must contain:
at least one "l"
at least one "e"
at least one "o"The universal words are:
["google", "leetcode"]First Thought: Check Every Pair
A direct solution is to compare every word in words1 against every word in words2.
For each pair (a, b):
- Count letters in
a. - Count letters in
b. - Check whether
ahas enough of every letter required byb.
from collections import Counter
class Solution:
def wordSubsets(self, words1: list[str], words2: list[str]) -> list[str]:
answer = []
for a in words1:
count_a = Counter(a)
valid = True
for b in words2:
count_b = Counter(b)
for ch, need in count_b.items():
if count_a[ch] < need:
valid = False
break
if not valid:
break
if valid:
answer.append(a)
return answerThis works, but it repeats similar checks many times.
Problem With Brute Force
If words2 has many words, checking every words1 word against every words2 word is wasteful.
For example:
words2 = ["e", "oo", "lo"]A universal word must satisfy all of them.
Instead of checking those words separately, we can merge their requirements into one frequency requirement:
e -> 1
o -> 2
l -> 1Then each word in words1 only needs to be checked once against this merged requirement.
Key Insight
For each character, we only care about the maximum frequency required by any word in words2.
Example:
words2 = ["e", "oo", "lo"]Requirements:
| Word | Requirement |
|---|---|
"e" | e >= 1 |
"oo" | o >= 2 |
"lo" | l >= 1, o >= 1 |
Merged requirement:
e >= 1
o >= 2
l >= 1The o >= 1 requirement from "lo" is already covered by o >= 2 from "oo".
So a word from words1 is universal exactly when its character counts dominate this merged requirement.
Algorithm
- Create an array
needof size26, initialized to zero. - For every word in
words2:- Count its letters.
- For each letter, update
need[i]with the maximum count seen so far.
- For every word in
words1:- Count its letters.
- Check whether every count is at least the corresponding value in
need. - If yes, add the word to the answer.
- Return the answer.
Correctness
For every character c, need[c] stores the maximum number of times c appears in any word from words2.
If a word a from words1 is universal, then every word in words2 is a subset of a. Therefore, for every character c, a must contain at least as many copies of c as the word in words2 that needs c the most. So a satisfies need.
Conversely, if a word a satisfies need, then for every word b in words2 and every character c, the count of c in b is at most need[c]. Since a has at least need[c], it also has enough characters for b.
Thus every word that satisfies need is universal, and every universal word satisfies need.
Therefore, the algorithm returns exactly all universal words.
Complexity
Let:
N = total number of characters in words1
M = total number of characters in words2| Metric | Value | Why |
|---|---|---|
| Time | O(N + M) | Each word is counted once, and each count array has fixed size 26 |
| Space | O(1) extra | The frequency arrays have fixed size 26, ignoring the output |
Implementation
class Solution:
def wordSubsets(self, words1: list[str], words2: list[str]) -> list[str]:
def count_letters(word: str) -> list[int]:
count = [0] * 26
for ch in word:
index = ord(ch) - ord("a")
count[index] += 1
return count
need = [0] * 26
for word in words2:
count = count_letters(word)
for i in range(26):
need[i] = max(need[i], count[i])
answer = []
for word in words1:
count = count_letters(word)
valid = True
for i in range(26):
if count[i] < need[i]:
valid = False
break
if valid:
answer.append(word)
return answerCode Explanation
The helper function counts lowercase letters:
def count_letters(word: str) -> list[int]:It returns an array of length 26, where index 0 means "a", index 1 means "b", and so on.
index = ord(ch) - ord("a")
count[index] += 1The array need stores the merged requirement from words2:
need = [0] * 26For every word in words2, we count letters and keep the maximum requirement for each letter:
for i in range(26):
need[i] = max(need[i], count[i])Then each word in words1 is checked against need.
if count[i] < need[i]:
valid = False
breakIf the word satisfies every letter requirement, it is universal:
if valid:
answer.append(word)Testing
def run_tests():
s = Solution()
assert s.wordSubsets(
["amazon", "apple", "facebook", "google", "leetcode"],
["e", "o"],
) == ["facebook", "google", "leetcode"]
assert s.wordSubsets(
["amazon", "apple", "facebook", "google", "leetcode"],
["l", "e"],
) == ["apple", "google", "leetcode"]
assert s.wordSubsets(
["amazon", "apple", "facebook", "google", "leetcode"],
["lo", "eo"],
) == ["google", "leetcode"]
assert s.wordSubsets(
["amazon", "apple", "facebook", "google", "leetcode"],
["ec", "oc", "ceo"],
) == ["facebook", "leetcode"]
assert s.wordSubsets(
["a", "aa", "aaa"],
["aa"],
) == ["aa", "aaa"]
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
["e", "o"] | Separate single-letter requirements |
["l", "e"] | Checks two different letters |
["lo", "eo"] | Merges overlapping requirements |
["ec", "oc", "ceo"] | Requires c, e, and o together |
["aa"] | Checks multiplicity |