A clear explanation of counting how many words are subsequences of a string using waiting queues.
Problem Restatement
We are given a string s and an array of strings words.
We need to return how many strings in words are subsequences of s.
A subsequence is formed by deleting zero or more characters from a string without changing the order of the remaining characters. For example, "ace" is a subsequence of "abcde".
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s and a list of strings words |
| Output | Number of words that are subsequences of s |
| Subsequence rule | Characters must appear in the same relative order |
| Important detail | Each word is checked independently |
Function shape:
class Solution:
def numMatchingSubseq(self, s: str, words: list[str]) -> int:
...Examples
Example 1:
s = "abcde"
words = ["a", "bb", "acd", "ace"]The matching subsequences are:
"a"
"acd"
"ace"The word "bb" is not a subsequence because s has only one b.
So the answer is:
3Example 2:
s = "dsahjpjauf"
words = ["ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax"]The matching subsequences are:
"ahjpjau"
"ja"So the answer is:
2First Thought: Check Each Word Separately
A simple approach is to test every word against s.
For each word, use two pointers:
i = 0 # pointer in s
j = 0 # pointer in wordMove through s, and whenever s[i] == word[j], advance j.
If j reaches the end of the word, then the word is a subsequence.
This works, but it scans s once for every word.
Problem With Repeated Scanning
If s is long and there are many words, repeated scanning wastes work.
For example:
len(s) = 50000
len(words) = 5000Scanning all of s for every word may do:
50000 * 5000character checks.
We need to process s once and advance all waiting words together.
Key Insight
Each word is always waiting for its next required character.
For example, if the word is:
"ace"then at first it waits for:
"a"After matching "a", it waits for:
"c"After matching "c", it waits for:
"e"We can group words by the character they are currently waiting for.
Then we scan s from left to right.
When we see character ch, we advance only the words currently waiting for ch.
Algorithm
Create 26 buckets, one for each lowercase letter.
Each bucket stores pairs:
(word_index, next_position)where next_position is the index of the next character this word needs.
Initialization:
- For each word
words[i], put(i, 0)into the bucket forwords[i][0].
Then scan s.
For each character ch in s:
- Take all entries currently waiting for
ch. - Clear that bucket.
- For each
(word_index, position):- Advance to
position + 1. - If this equals the word length, the word is complete.
- Otherwise, move it to the bucket for its next needed character.
- Advance to
Clearing the bucket before processing is important. It prevents a word from consuming the same character of s more than once.
Correctness
Each bucket represents words waiting for a specific next character.
At the start, every word is waiting for its first character, so the initialization is correct.
When the scan reaches a character ch in s, only words waiting for ch can make progress. The algorithm advances exactly those words by one character. Words waiting for other characters cannot match this position in s, so leaving them unchanged is correct.
If advancing a word reaches the end of that word, then every character of the word has been matched in order against positions already scanned in s. Therefore, the word is a subsequence, and the algorithm counts it.
If the word still has unmatched characters, the algorithm moves it to the bucket for its next required character. This preserves the invariant that every unfinished word is stored under the next character it needs.
Because the scan processes s from left to right and each word advances only when its next character appears, the algorithm counts exactly the words that are subsequences of s.
Complexity
Let:
L = sum(len(word) for word in words)| Metric | Value | Why |
|---|---|---|
| Time | O(len(s) + L) | Each character of s is scanned once, and each word position advances at most once |
| Space | O(len(words)) | Buckets store one state per unfinished word |
The words themselves are given as input and are not copied.
Implementation
from collections import deque
class Solution:
def numMatchingSubseq(self, s: str, words: list[str]) -> int:
buckets = [deque() for _ in range(26)]
for index, word in enumerate(words):
first = ord(word[0]) - ord("a")
buckets[first].append((index, 0))
answer = 0
for ch in s:
bucket_index = ord(ch) - ord("a")
waiting = buckets[bucket_index]
buckets[bucket_index] = deque()
while waiting:
word_index, position = waiting.popleft()
next_position = position + 1
if next_position == len(words[word_index]):
answer += 1
else:
next_char = words[word_index][next_position]
next_bucket = ord(next_char) - ord("a")
buckets[next_bucket].append((word_index, next_position))
return answerCode Explanation
We create one queue for each lowercase letter:
buckets = [deque() for _ in range(26)]Each word starts by waiting for its first character:
for index, word in enumerate(words):
first = ord(word[0]) - ord("a")
buckets[first].append((index, 0))The pair (index, 0) means:
words[index] is waiting for words[index][0]Then we scan s:
for ch in s:We take the queue of words waiting for this character:
waiting = buckets[bucket_index]
buckets[bucket_index] = deque()Replacing the bucket with an empty queue is necessary. A word moved back into the same bucket should wait for a future occurrence of the same character, not reuse the current one.
For each waiting word, we advance one position:
next_position = position + 1If this reaches the word length, the word is complete:
if next_position == len(words[word_index]):
answer += 1Otherwise, move it to the bucket for its next needed character:
next_char = words[word_index][next_position]
next_bucket = ord(next_char) - ord("a")
buckets[next_bucket].append((word_index, next_position))Alternative: Binary Search on Character Positions
Another valid method is to preprocess positions of each character in s.
Then for each word, use binary search to find each next character after the previous matched index.
This gives:
| Metric | Value |
|---|---|
| Time | O(len(s) + L log len(s)) |
| Space | O(len(s)) |
The waiting-queue solution is more direct and avoids binary search.
Testing
def run_tests():
sol = Solution()
assert sol.numMatchingSubseq(
"abcde",
["a", "bb", "acd", "ace"],
) == 3
assert sol.numMatchingSubseq(
"dsahjpjauf",
["ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax"],
) == 2
assert sol.numMatchingSubseq(
"aaaaa",
["a", "aa", "aaa", "aaaa", "aaaaa", "aaaaaa"],
) == 5
assert sol.numMatchingSubseq(
"abc",
["abc", "ac", "bac", "cba"],
) == 2
assert sol.numMatchingSubseq(
"xyz",
["a", "bb", "ccc"],
) == 0
print("all tests passed")
run_tests()Test coverage:
| Test | Why |
|---|---|
| Standard example | Checks basic subsequence matching |
| Longer example | Checks multiple non-trivial words |
| Repeated same character | Ensures one s character is not reused |
| Wrong order words | Confirms order matters |
| No matching letters | Checks zero result |