LeetCode 792: Number of Matching Subsequences

Problem Restatement

We are given a string s and an array of strings words.

We need to return how many strings in words are subsequences of s.

A subsequence is formed by deleting zero or more characters from a string without changing the order of the remaining characters. For example, "ace" is a subsequence of "abcde".

Input and Output

Item	Meaning
Input	A string `s` and a list of strings `words`
Output	Number of words that are subsequences of `s`
Subsequence rule	Characters must appear in the same relative order
Important detail	Each word is checked independently

Function shape:

class Solution:
    def numMatchingSubseq(self, s: str, words: list[str]) -> int:
        ...

Examples

Example 1:

s = "abcde"
words = ["a", "bb", "acd", "ace"]

The matching subsequences are:

"a"
"acd"
"ace"

The word "bb" is not a subsequence because s has only one b.

So the answer is:

Example 2:

s = "dsahjpjauf"
words = ["ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax"]

The matching subsequences are:

"ahjpjau"
"ja"

So the answer is:

First Thought: Check Each Word Separately

A simple approach is to test every word against s.

For each word, use two pointers:

i = 0  # pointer in s
j = 0  # pointer in word

Move through s, and whenever s[i] == word[j], advance j.

If j reaches the end of the word, then the word is a subsequence.

This works, but it scans s once for every word.

Problem With Repeated Scanning

If s is long and there are many words, repeated scanning wastes work.

For example:

len(s) = 50000
len(words) = 5000

Scanning all of s for every word may do:

50000 * 5000

character checks.

We need to process s once and advance all waiting words together.

Key Insight

Each word is always waiting for its next required character.

For example, if the word is:

"ace"

then at first it waits for:

"a"

After matching "a", it waits for:

"c"

After matching "c", it waits for:

"e"

We can group words by the character they are currently waiting for.

Then we scan s from left to right.

When we see character ch, we advance only the words currently waiting for ch.

Algorithm

Create 26 buckets, one for each lowercase letter.

Each bucket stores pairs:

(word_index, next_position)

where next_position is the index of the next character this word needs.

Initialization:

For each word words[i], put (i, 0) into the bucket for words[i][0].

Then scan s.

For each character ch in s:

Take all entries currently waiting for ch.
Clear that bucket.
For each (word_index, position):
1. Advance to position + 1.
2. If this equals the word length, the word is complete.
3. Otherwise, move it to the bucket for its next needed character.

Clearing the bucket before processing is important. It prevents a word from consuming the same character of s more than once.

Correctness

Each bucket represents words waiting for a specific next character.

At the start, every word is waiting for its first character, so the initialization is correct.

When the scan reaches a character ch in s, only words waiting for ch can make progress. The algorithm advances exactly those words by one character. Words waiting for other characters cannot match this position in s, so leaving them unchanged is correct.

If advancing a word reaches the end of that word, then every character of the word has been matched in order against positions already scanned in s. Therefore, the word is a subsequence, and the algorithm counts it.

If the word still has unmatched characters, the algorithm moves it to the bucket for its next required character. This preserves the invariant that every unfinished word is stored under the next character it needs.

Because the scan processes s from left to right and each word advances only when its next character appears, the algorithm counts exactly the words that are subsequences of s.

Complexity

Let:

L = sum(len(word) for word in words)

Metric	Value	Why
Time	`O(len(s) + L)`	Each character of `s` is scanned once, and each word position advances at most once
Space	`O(len(words))`	Buckets store one state per unfinished word

The words themselves are given as input and are not copied.

Implementation

from collections import deque

class Solution:
    def numMatchingSubseq(self, s: str, words: list[str]) -> int:
        buckets = [deque() for _ in range(26)]

        for index, word in enumerate(words):
            first = ord(word[0]) - ord("a")
            buckets[first].append((index, 0))

        answer = 0

        for ch in s:
            bucket_index = ord(ch) - ord("a")
            waiting = buckets[bucket_index]
            buckets[bucket_index] = deque()

            while waiting:
                word_index, position = waiting.popleft()
                next_position = position + 1

                if next_position == len(words[word_index]):
                    answer += 1
                else:
                    next_char = words[word_index][next_position]
                    next_bucket = ord(next_char) - ord("a")
                    buckets[next_bucket].append((word_index, next_position))

        return answer

Code Explanation

We create one queue for each lowercase letter:

buckets = [deque() for _ in range(26)]

Each word starts by waiting for its first character:

for index, word in enumerate(words):
    first = ord(word[0]) - ord("a")
    buckets[first].append((index, 0))

The pair (index, 0) means:

words[index] is waiting for words[index][0]

Then we scan s:

for ch in s:

We take the queue of words waiting for this character:

waiting = buckets[bucket_index]
buckets[bucket_index] = deque()

Replacing the bucket with an empty queue is necessary. A word moved back into the same bucket should wait for a future occurrence of the same character, not reuse the current one.

For each waiting word, we advance one position:

next_position = position + 1

If this reaches the word length, the word is complete:

if next_position == len(words[word_index]):
    answer += 1

Otherwise, move it to the bucket for its next needed character:

next_char = words[word_index][next_position]
next_bucket = ord(next_char) - ord("a")
buckets[next_bucket].append((word_index, next_position))

Alternative: Binary Search on Character Positions

Another valid method is to preprocess positions of each character in s.

Then for each word, use binary search to find each next character after the previous matched index.

This gives:

Metric	Value
Time	`O(len(s) + L log len(s))`
Space	`O(len(s))`

The waiting-queue solution is more direct and avoids binary search.

Testing

def run_tests():
    sol = Solution()

    assert sol.numMatchingSubseq(
        "abcde",
        ["a", "bb", "acd", "ace"],
    ) == 3

    assert sol.numMatchingSubseq(
        "dsahjpjauf",
        ["ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax"],
    ) == 2

    assert sol.numMatchingSubseq(
        "aaaaa",
        ["a", "aa", "aaa", "aaaa", "aaaaa", "aaaaaa"],
    ) == 5

    assert sol.numMatchingSubseq(
        "abc",
        ["abc", "ac", "bac", "cba"],
    ) == 2

    assert sol.numMatchingSubseq(
        "xyz",
        ["a", "bb", "ccc"],
    ) == 0

    print("all tests passed")

run_tests()

Test coverage:

Test	Why
Standard example	Checks basic subsequence matching
Longer example	Checks multiple non-trivial words
Repeated same character	Ensures one `s` character is not reused
Wrong order words	Confirms order matters
No matching letters	Checks zero result