LeetCode 30: Substring with Concatenation of All Words

Problem Restatement

We are given a string s and an array words.

Every word in words has the same length.

We need to find every starting index in s where a substring is made by concatenating all words in words exactly once, in any order, with no extra characters between them.

Return the starting indices in any order.

For example, if:

words = ["ab", "cd", "ef"]

then valid concatenations include:

"abcdef"
"abefcd"
"cdabef"
"cdefab"
"efabcd"
"efcdab"

Each word must be used exactly once. The order can change.

The constraints say 1 <= s.length <= 10^4, 1 <= words.length <= 5000, and 1 <= words[i].length <= 30. All strings contain lowercase English letters. All words have the same length.

Input and Output

Item	Meaning
Input	A string `s` and a list of strings `words`
Output	A list of starting indices
Word length	Every word has the same length
Match rule	Every word must appear exactly once
Order	Any order is allowed
Gaps	No gaps between words

Function shape:

def findSubstring(s: str, words: list[str]) -> list[int]:
    ...

Examples

Example 1:

s = "barfoothefoobarman"
words = ["foo", "bar"]

Each word has length 3.

The total substring length is:

2 * 3 = 6

At index 0, the substring is:

"barfoo"

This uses "bar" and "foo" exactly once.

At index 9, the substring is:

"foobar"

This also uses both words exactly once.

Return:

[0, 9]

Example 2:

s = "wordgoodgoodgoodbestword"
words = ["word", "good", "best", "word"]

The word "word" must appear twice, because it appears twice in words.

No substring has exactly this multiset of words.

Return:

[]

Example 3:

s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]

Valid starting indices are:

[6, 9, 12]

At index 6:

"foobarthe"

At index 9:

"barthefoo"

At index 12:

"thefoobar"

Each substring uses "bar", "foo", and "the" exactly once.

First Thought: Check Every Start

The direct approach is to try every starting index.

For each start index, split the next total_len characters into word-sized chunks.

Then count those chunks and compare them with the required word counts.

from collections import Counter

class Solution:
    def findSubstring(self, s: str, words: list[str]) -> list[int]:
        word_len = len(words[0])
        word_count = len(words)
        total_len = word_len * word_count
        need = Counter(words)

        ans = []

        for start in range(len(s) - total_len + 1):
            seen = Counter()
            ok = True

            for offset in range(0, total_len, word_len):
                part = s[start + offset:start + offset + word_len]

                if part not in need:
                    ok = False
                    break

                seen[part] += 1

                if seen[part] > need[part]:
                    ok = False
                    break

            if ok:
                ans.append(start)

        return ans

This is easy to understand.

For every possible start, we check whether the substring contains the exact multiset of words.

Problem With Brute Force

The brute force method repeats too much work.

If s has length n, and there are m words, then each start may inspect m chunks.

So the time complexity is roughly:

O(n * m * word_len)

The slicing cost adds another factor of word_len.

This can still pass in some languages with careful implementation, but the intended method uses a sliding window over word-sized chunks.

Key Insight

All words have the same length.

So a valid substring can be viewed as a sequence of fixed-size blocks.

For example:

s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]
word_len = 3

We should read s in chunks of length 3:

bar | foo | foo | bar | the | foo | bar | man

A valid window contains exactly len(words) chunks, and its chunk counts must match Counter(words).

The tricky part is alignment.

A word can start at index 0, 1, or 2 when word_len = 3.

So we run the sliding window once for each offset:

0, 1, ..., word_len - 1

Each run moves only by word_len.

Algorithm

Let:

word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count
need = Counter(words)

If total_len > len(s), return an empty list.

Now process each alignment offset from 0 to word_len - 1.

For each offset:

Set left = offset.
Set window = Counter().
Set used = 0, the number of word chunks currently in the window.
Move right by word_len each time.
Read word = s[right:right + word_len].

There are three cases.

Case 1: word is not in need.

The current window can no longer be valid.

Clear the window and move left after this word.

window.clear()
used = 0
left = right + word_len

Case 2: word is needed.

Add it to the window.

window[word] += 1
used += 1

If this word appears too many times, shrink from the left until the count becomes valid again.

while window[word] > need[word]:
    left_word = s[left:left + word_len]
    window[left_word] -= 1
    used -= 1
    left += word_len

Case 3: The window has exactly word_count chunks.

Then left is a valid starting index.

if used == word_count:
    ans.append(left)

Then move the left side forward by one word so the window can search for the next answer.

Correctness

The algorithm processes each alignment separately. This is enough because every valid substring starts at some index whose remainder modulo word_len is one of those offsets.

Within one alignment, every step reads exactly one word-sized chunk. The window always starts and ends on chunk boundaries, so every candidate window represents a sequence of complete word chunks.

If a chunk does not belong to need, then no valid substring crossing that chunk can exist. Clearing the window is correct because any window containing that chunk would contain a word outside words.

If a chunk belongs to need, the algorithm adds it to the current window. When its count becomes too large, the algorithm moves left forward until the count is valid again. This preserves the rule that no word appears more times than required.

Therefore, whenever used == word_count, the window contains exactly word_count chunks and no word count exceeds its required count. Since the total number of chunks equals the required number of words, all word counts must match exactly. So left is a valid answer.

Every valid answer is found because when the right side reaches the end of that valid block, all its chunks are needed and no count exceeds the required count. The window will have exactly word_count chunks, so the algorithm appends its start index.

Complexity

Metric	Value	Why
Time	`O(n * word_len)`	There are `word_len` offsets, and each chunk is added and removed at most once per offset
Space	`O(m)`	The counters store at most the distinct words

More commonly, because word_len <= 30, this behaves like linear time in the length of s.

Here, n = len(s) and m = len(words).

Implementation

from collections import Counter

class Solution:
    def findSubstring(self, s: str, words: list[str]) -> list[int]:
        word_len = len(words[0])
        word_count = len(words)
        total_len = word_len * word_count

        if total_len > len(s):
            return []

        need = Counter(words)
        ans = []

        for offset in range(word_len):
            left = offset
            window = Counter()
            used = 0

            for right in range(offset, len(s) - word_len + 1, word_len):
                word = s[right:right + word_len]

                if word not in need:
                    window.clear()
                    used = 0
                    left = right + word_len
                    continue

                window[word] += 1
                used += 1

                while window[word] > need[word]:
                    left_word = s[left:left + word_len]
                    window[left_word] -= 1
                    used -= 1
                    left += word_len

                if used == word_count:
                    ans.append(left)

                    left_word = s[left:left + word_len]
                    window[left_word] -= 1
                    used -= 1
                    left += word_len

        return ans

Code Explanation

First we compute the sizes:

word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count

A valid substring must have length total_len.

If this is longer than s, no answer exists.

if total_len > len(s):
    return []

The counter need stores the required number of times each word must appear.

need = Counter(words)

Then we process every alignment.

for offset in range(word_len):

For each alignment, left is the start of the current window.

left = offset

The right pointer moves by whole words.

for right in range(offset, len(s) - word_len + 1, word_len):

We read one chunk:

word = s[right:right + word_len]

If this chunk is not a required word, the current window is useless.

if word not in need:
    window.clear()
    used = 0
    left = right + word_len
    continue

Otherwise, add it to the window.

window[word] += 1
used += 1

If the window has too many copies of this word, shrink it.

while window[word] > need[word]:
    left_word = s[left:left + word_len]
    window[left_word] -= 1
    used -= 1
    left += word_len

When the window has exactly the required number of chunks, record the start.

if used == word_count:
    ans.append(left)

Then remove the leftmost word so the window can continue and find overlapping answers.

left_word = s[left:left + word_len]
window[left_word] -= 1
used -= 1
left += word_len

Testing

def check(s: str, words: list[str], expected: list[int]) -> None:
    actual = Solution().findSubstring(s, words)
    assert sorted(actual) == sorted(expected), (s, words, actual, expected)

def run_tests():
    check("barfoothefoobarman", ["foo", "bar"], [0, 9])
    check("wordgoodgoodgoodbestword", ["word", "good", "best", "word"], [])
    check("barfoofoobarthefoobarman", ["bar", "foo", "the"], [6, 9, 12])

    check("wordgoodgoodgoodbestword", ["word", "good", "best", "good"], [8])
    check("aaaaaa", ["aa", "aa"], [0, 1, 2])
    check("abc", ["abcd"], [])
    check("abababab", ["ab", "ba"], [1, 3])

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`"barfoothefoobarman"`	Basic two-word case
`"wordgoodgoodgoodbestword"` with duplicate requirement missing	No valid window
`"barfoofoobarthefoobarman"`	Multiple valid windows
Duplicate word required	Counts must match, not just membership
`"aaaaaa"`	Overlapping answers
Word longer than string	Immediate empty result
Offset alignment	Valid starts may occur away from index `0`