# LeetCode 792: Number of Matching Subsequences

## Problem Restatement

We are given a string `s` and an array of strings `words`.

We need to return how many strings in `words` are subsequences of `s`.

A subsequence is formed by deleting zero or more characters from a string without changing the order of the remaining characters. For example, `"ace"` is a subsequence of `"abcde"`.

## Input and Output

| Item | Meaning |
|---|---|
| Input | A string `s` and a list of strings `words` |
| Output | Number of words that are subsequences of `s` |
| Subsequence rule | Characters must appear in the same relative order |
| Important detail | Each word is checked independently |

Function shape:

```python
class Solution:
    def numMatchingSubseq(self, s: str, words: list[str]) -> int:
        ...
```

## Examples

Example 1:

```python
s = "abcde"
words = ["a", "bb", "acd", "ace"]
```

The matching subsequences are:

```text
"a"
"acd"
"ace"
```

The word `"bb"` is not a subsequence because `s` has only one `b`.

So the answer is:

```python
3
```

Example 2:

```python
s = "dsahjpjauf"
words = ["ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax"]
```

The matching subsequences are:

```text
"ahjpjau"
"ja"
```

So the answer is:

```python
2
```

## First Thought: Check Each Word Separately

A simple approach is to test every word against `s`.

For each word, use two pointers:

```python
i = 0  # pointer in s
j = 0  # pointer in word
```

Move through `s`, and whenever `s[i] == word[j]`, advance `j`.

If `j` reaches the end of the word, then the word is a subsequence.

This works, but it scans `s` once for every word.

## Problem With Repeated Scanning

If `s` is long and there are many words, repeated scanning wastes work.

For example:

```python
len(s) = 50000
len(words) = 5000
```

Scanning all of `s` for every word may do:

```text
50000 * 5000
```

character checks.

We need to process `s` once and advance all waiting words together.

## Key Insight

Each word is always waiting for its next required character.

For example, if the word is:

```python
"ace"
```

then at first it waits for:

```python
"a"
```

After matching `"a"`, it waits for:

```python
"c"
```

After matching `"c"`, it waits for:

```python
"e"
```

We can group words by the character they are currently waiting for.

Then we scan `s` from left to right.

When we see character `ch`, we advance only the words currently waiting for `ch`.

## Algorithm

Create 26 buckets, one for each lowercase letter.

Each bucket stores pairs:

```python
(word_index, next_position)
```

where `next_position` is the index of the next character this word needs.

Initialization:

1. For each word `words[i]`, put `(i, 0)` into the bucket for `words[i][0]`.

Then scan `s`.

For each character `ch` in `s`:

1. Take all entries currently waiting for `ch`.
2. Clear that bucket.
3. For each `(word_index, position)`:
   1. Advance to `position + 1`.
   2. If this equals the word length, the word is complete.
   3. Otherwise, move it to the bucket for its next needed character.

Clearing the bucket before processing is important. It prevents a word from consuming the same character of `s` more than once.

## Correctness

Each bucket represents words waiting for a specific next character.

At the start, every word is waiting for its first character, so the initialization is correct.

When the scan reaches a character `ch` in `s`, only words waiting for `ch` can make progress. The algorithm advances exactly those words by one character. Words waiting for other characters cannot match this position in `s`, so leaving them unchanged is correct.

If advancing a word reaches the end of that word, then every character of the word has been matched in order against positions already scanned in `s`. Therefore, the word is a subsequence, and the algorithm counts it.

If the word still has unmatched characters, the algorithm moves it to the bucket for its next required character. This preserves the invariant that every unfinished word is stored under the next character it needs.

Because the scan processes `s` from left to right and each word advances only when its next character appears, the algorithm counts exactly the words that are subsequences of `s`.

## Complexity

Let:

```python
L = sum(len(word) for word in words)
```

| Metric | Value | Why |
|---|---|---|
| Time | `O(len(s) + L)` | Each character of `s` is scanned once, and each word position advances at most once |
| Space | `O(len(words))` | Buckets store one state per unfinished word |

The words themselves are given as input and are not copied.

## Implementation

```python
from collections import deque

class Solution:
    def numMatchingSubseq(self, s: str, words: list[str]) -> int:
        buckets = [deque() for _ in range(26)]

        for index, word in enumerate(words):
            first = ord(word[0]) - ord("a")
            buckets[first].append((index, 0))

        answer = 0

        for ch in s:
            bucket_index = ord(ch) - ord("a")
            waiting = buckets[bucket_index]
            buckets[bucket_index] = deque()

            while waiting:
                word_index, position = waiting.popleft()
                next_position = position + 1

                if next_position == len(words[word_index]):
                    answer += 1
                else:
                    next_char = words[word_index][next_position]
                    next_bucket = ord(next_char) - ord("a")
                    buckets[next_bucket].append((word_index, next_position))

        return answer
```

## Code Explanation

We create one queue for each lowercase letter:

```python
buckets = [deque() for _ in range(26)]
```

Each word starts by waiting for its first character:

```python
for index, word in enumerate(words):
    first = ord(word[0]) - ord("a")
    buckets[first].append((index, 0))
```

The pair `(index, 0)` means:

```text
words[index] is waiting for words[index][0]
```

Then we scan `s`:

```python
for ch in s:
```

We take the queue of words waiting for this character:

```python
waiting = buckets[bucket_index]
buckets[bucket_index] = deque()
```

Replacing the bucket with an empty queue is necessary. A word moved back into the same bucket should wait for a future occurrence of the same character, not reuse the current one.

For each waiting word, we advance one position:

```python
next_position = position + 1
```

If this reaches the word length, the word is complete:

```python
if next_position == len(words[word_index]):
    answer += 1
```

Otherwise, move it to the bucket for its next needed character:

```python
next_char = words[word_index][next_position]
next_bucket = ord(next_char) - ord("a")
buckets[next_bucket].append((word_index, next_position))
```

## Alternative: Binary Search on Character Positions

Another valid method is to preprocess positions of each character in `s`.

Then for each word, use binary search to find each next character after the previous matched index.

This gives:

| Metric | Value |
|---|---|
| Time | `O(len(s) + L log len(s))` |
| Space | `O(len(s))` |

The waiting-queue solution is more direct and avoids binary search.

## Testing

```python
def run_tests():
    sol = Solution()

    assert sol.numMatchingSubseq(
        "abcde",
        ["a", "bb", "acd", "ace"],
    ) == 3

    assert sol.numMatchingSubseq(
        "dsahjpjauf",
        ["ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax"],
    ) == 2

    assert sol.numMatchingSubseq(
        "aaaaa",
        ["a", "aa", "aaa", "aaaa", "aaaaa", "aaaaaa"],
    ) == 5

    assert sol.numMatchingSubseq(
        "abc",
        ["abc", "ac", "bac", "cba"],
    ) == 2

    assert sol.numMatchingSubseq(
        "xyz",
        ["a", "bb", "ccc"],
    ) == 0

    print("all tests passed")

run_tests()
```

Test coverage:

| Test | Why |
|---|---|
| Standard example | Checks basic subsequence matching |
| Longer example | Checks multiple non-trivial words |
| Repeated same character | Ensures one `s` character is not reused |
| Wrong order words | Confirms order matters |
| No matching letters | Checks zero result |