LeetCode 916: Word Subsets

Problem Restatement

We are given two string arrays:

words1
words2

A string b is a subset of string a if every letter in b appears in a with at least the same frequency.

For example:

"wrr"

is a subset of:

"warrior"

because "warrior" has at least one "w" and at least two "r" characters.

But "wrr" is not a subset of:

"world"

because "world" has only one "r".

A word from words1 is universal if every word in words2 is a subset of it.

Return all universal words from words1.

The answer can be returned in any order. This matches the official problem statement.

Input and Output

Item	Meaning
Input	Two string arrays `words1` and `words2`
Output	All universal words from `words1`
Subset rule	Letter multiplicity matters
Order	Output order does not matter

Function shape:

def wordSubsets(words1: list[str], words2: list[str]) -> list[str]:
    ...

Examples

Example 1:

words1 = ["amazon", "apple", "facebook", "google", "leetcode"]
words2 = ["e", "o"]

A universal word must contain:

at least one "e"
at least one "o"

The universal words are:

["facebook", "google", "leetcode"]

Example 2:

words1 = ["amazon", "apple", "facebook", "google", "leetcode"]
words2 = ["l", "e"]

A universal word must contain:

at least one "l"
at least one "e"

The universal words are:

["apple", "google", "leetcode"]

Example 3:

words1 = ["amazon", "apple", "facebook", "google", "leetcode"]
words2 = ["lo", "eo"]

A universal word must contain:

at least one "l"
at least one "e"
at least one "o"

The universal words are:

["google", "leetcode"]

First Thought: Check Every Pair

A direct solution is to compare every word in words1 against every word in words2.

For each pair (a, b):

Count letters in a.
Count letters in b.
Check whether a has enough of every letter required by b.

from collections import Counter

class Solution:
    def wordSubsets(self, words1: list[str], words2: list[str]) -> list[str]:
        answer = []

        for a in words1:
            count_a = Counter(a)
            valid = True

            for b in words2:
                count_b = Counter(b)

                for ch, need in count_b.items():
                    if count_a[ch] < need:
                        valid = False
                        break

                if not valid:
                    break

            if valid:
                answer.append(a)

        return answer

This works, but it repeats similar checks many times.

Problem With Brute Force

If words2 has many words, checking every words1 word against every words2 word is wasteful.

For example:

words2 = ["e", "oo", "lo"]

A universal word must satisfy all of them.

Instead of checking those words separately, we can merge their requirements into one frequency requirement:

e -> 1
o -> 2
l -> 1

Then each word in words1 only needs to be checked once against this merged requirement.

Key Insight

For each character, we only care about the maximum frequency required by any word in words2.

Example:

words2 = ["e", "oo", "lo"]

Requirements:

Word	Requirement
`"e"`	`e >= 1`
`"oo"`	`o >= 2`
`"lo"`	`l >= 1`, `o >= 1`

Merged requirement:

e >= 1
o >= 2
l >= 1

The o >= 1 requirement from "lo" is already covered by o >= 2 from "oo".

So a word from words1 is universal exactly when its character counts dominate this merged requirement.

Algorithm

Create an array need of size 26, initialized to zero.
For every word in words2:
- Count its letters.
- For each letter, update need[i] with the maximum count seen so far.
For every word in words1:
- Count its letters.
- Check whether every count is at least the corresponding value in need.
- If yes, add the word to the answer.
Return the answer.

Correctness

For every character c, need[c] stores the maximum number of times c appears in any word from words2.

If a word a from words1 is universal, then every word in words2 is a subset of a. Therefore, for every character c, a must contain at least as many copies of c as the word in words2 that needs c the most. So a satisfies need.

Conversely, if a word a satisfies need, then for every word b in words2 and every character c, the count of c in b is at most need[c]. Since a has at least need[c], it also has enough characters for b.

Thus every word that satisfies need is universal, and every universal word satisfies need.

Therefore, the algorithm returns exactly all universal words.

Complexity

Let:

N = total number of characters in words1
M = total number of characters in words2

Metric	Value	Why
Time	`O(N + M)`	Each word is counted once, and each count array has fixed size `26`
Space	`O(1)` extra	The frequency arrays have fixed size `26`, ignoring the output

Implementation

class Solution:
    def wordSubsets(self, words1: list[str], words2: list[str]) -> list[str]:
        def count_letters(word: str) -> list[int]:
            count = [0] * 26

            for ch in word:
                index = ord(ch) - ord("a")
                count[index] += 1

            return count

        need = [0] * 26

        for word in words2:
            count = count_letters(word)

            for i in range(26):
                need[i] = max(need[i], count[i])

        answer = []

        for word in words1:
            count = count_letters(word)

            valid = True

            for i in range(26):
                if count[i] < need[i]:
                    valid = False
                    break

            if valid:
                answer.append(word)

        return answer

Code Explanation

The helper function counts lowercase letters:

def count_letters(word: str) -> list[int]:

It returns an array of length 26, where index 0 means "a", index 1 means "b", and so on.

index = ord(ch) - ord("a")
count[index] += 1

The array need stores the merged requirement from words2:

need = [0] * 26

For every word in words2, we count letters and keep the maximum requirement for each letter:

for i in range(26):
    need[i] = max(need[i], count[i])

Then each word in words1 is checked against need.

if count[i] < need[i]:
    valid = False
    break

If the word satisfies every letter requirement, it is universal:

if valid:
    answer.append(word)

Testing

def run_tests():
    s = Solution()

    assert s.wordSubsets(
        ["amazon", "apple", "facebook", "google", "leetcode"],
        ["e", "o"],
    ) == ["facebook", "google", "leetcode"]

    assert s.wordSubsets(
        ["amazon", "apple", "facebook", "google", "leetcode"],
        ["l", "e"],
    ) == ["apple", "google", "leetcode"]

    assert s.wordSubsets(
        ["amazon", "apple", "facebook", "google", "leetcode"],
        ["lo", "eo"],
    ) == ["google", "leetcode"]

    assert s.wordSubsets(
        ["amazon", "apple", "facebook", "google", "leetcode"],
        ["ec", "oc", "ceo"],
    ) == ["facebook", "leetcode"]

    assert s.wordSubsets(
        ["a", "aa", "aaa"],
        ["aa"],
    ) == ["aa", "aaa"]

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`["e", "o"]`	Separate single-letter requirements
`["l", "e"]`	Checks two different letters
`["lo", "eo"]`	Merges overlapping requirements
`["ec", "oc", "ceo"]`	Requires `c`, `e`, and `o` together
`["aa"]`	Checks multiplicity