A clear explanation of finding all starting indices where a substring is formed by concatenating every word exactly once.
Problem Restatement
We are given a string s and an array words.
Every word in words has the same length.
We need to find every starting index in s where a substring is made by concatenating all words in words exactly once, in any order, with no extra characters between them.
Return the starting indices in any order.
For example, if:
words = ["ab", "cd", "ef"]then valid concatenations include:
"abcdef"
"abefcd"
"cdabef"
"cdefab"
"efabcd"
"efcdab"Each word must be used exactly once. The order can change.
The constraints say 1 <= s.length <= 10^4, 1 <= words.length <= 5000, and 1 <= words[i].length <= 30. All strings contain lowercase English letters. All words have the same length.
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s and a list of strings words |
| Output | A list of starting indices |
| Word length | Every word has the same length |
| Match rule | Every word must appear exactly once |
| Order | Any order is allowed |
| Gaps | No gaps between words |
Function shape:
def findSubstring(s: str, words: list[str]) -> list[int]:
...Examples
Example 1:
s = "barfoothefoobarman"
words = ["foo", "bar"]Each word has length 3.
The total substring length is:
2 * 3 = 6At index 0, the substring is:
"barfoo"This uses "bar" and "foo" exactly once.
At index 9, the substring is:
"foobar"This also uses both words exactly once.
Return:
[0, 9]Example 2:
s = "wordgoodgoodgoodbestword"
words = ["word", "good", "best", "word"]The word "word" must appear twice, because it appears twice in words.
No substring has exactly this multiset of words.
Return:
[]Example 3:
s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]Valid starting indices are:
[6, 9, 12]At index 6:
"foobarthe"At index 9:
"barthefoo"At index 12:
"thefoobar"Each substring uses "bar", "foo", and "the" exactly once.
First Thought: Check Every Start
The direct approach is to try every starting index.
For each start index, split the next total_len characters into word-sized chunks.
Then count those chunks and compare them with the required word counts.
from collections import Counter
class Solution:
def findSubstring(self, s: str, words: list[str]) -> list[int]:
word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count
need = Counter(words)
ans = []
for start in range(len(s) - total_len + 1):
seen = Counter()
ok = True
for offset in range(0, total_len, word_len):
part = s[start + offset:start + offset + word_len]
if part not in need:
ok = False
break
seen[part] += 1
if seen[part] > need[part]:
ok = False
break
if ok:
ans.append(start)
return ansThis is easy to understand.
For every possible start, we check whether the substring contains the exact multiset of words.
Problem With Brute Force
The brute force method repeats too much work.
If s has length n, and there are m words, then each start may inspect m chunks.
So the time complexity is roughly:
O(n * m * word_len)The slicing cost adds another factor of word_len.
This can still pass in some languages with careful implementation, but the intended method uses a sliding window over word-sized chunks.
Key Insight
All words have the same length.
So a valid substring can be viewed as a sequence of fixed-size blocks.
For example:
s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]
word_len = 3We should read s in chunks of length 3:
bar | foo | foo | bar | the | foo | bar | manA valid window contains exactly len(words) chunks, and its chunk counts must match Counter(words).
The tricky part is alignment.
A word can start at index 0, 1, or 2 when word_len = 3.
So we run the sliding window once for each offset:
0, 1, ..., word_len - 1Each run moves only by word_len.
Algorithm
Let:
word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count
need = Counter(words)If total_len > len(s), return an empty list.
Now process each alignment offset from 0 to word_len - 1.
For each offset:
- Set
left = offset. - Set
window = Counter(). - Set
used = 0, the number of word chunks currently in the window. - Move
rightbyword_leneach time. - Read
word = s[right:right + word_len].
There are three cases.
Case 1: word is not in need.
The current window can no longer be valid.
Clear the window and move left after this word.
window.clear()
used = 0
left = right + word_lenCase 2: word is needed.
Add it to the window.
window[word] += 1
used += 1If this word appears too many times, shrink from the left until the count becomes valid again.
while window[word] > need[word]:
left_word = s[left:left + word_len]
window[left_word] -= 1
used -= 1
left += word_lenCase 3: The window has exactly word_count chunks.
Then left is a valid starting index.
if used == word_count:
ans.append(left)Then move the left side forward by one word so the window can search for the next answer.
Correctness
The algorithm processes each alignment separately. This is enough because every valid substring starts at some index whose remainder modulo word_len is one of those offsets.
Within one alignment, every step reads exactly one word-sized chunk. The window always starts and ends on chunk boundaries, so every candidate window represents a sequence of complete word chunks.
If a chunk does not belong to need, then no valid substring crossing that chunk can exist. Clearing the window is correct because any window containing that chunk would contain a word outside words.
If a chunk belongs to need, the algorithm adds it to the current window. When its count becomes too large, the algorithm moves left forward until the count is valid again. This preserves the rule that no word appears more times than required.
Therefore, whenever used == word_count, the window contains exactly word_count chunks and no word count exceeds its required count. Since the total number of chunks equals the required number of words, all word counts must match exactly. So left is a valid answer.
Every valid answer is found because when the right side reaches the end of that valid block, all its chunks are needed and no count exceeds the required count. The window will have exactly word_count chunks, so the algorithm appends its start index.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n * word_len) | There are word_len offsets, and each chunk is added and removed at most once per offset |
| Space | O(m) | The counters store at most the distinct words |
More commonly, because word_len <= 30, this behaves like linear time in the length of s.
Here, n = len(s) and m = len(words).
Implementation
from collections import Counter
class Solution:
def findSubstring(self, s: str, words: list[str]) -> list[int]:
word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count
if total_len > len(s):
return []
need = Counter(words)
ans = []
for offset in range(word_len):
left = offset
window = Counter()
used = 0
for right in range(offset, len(s) - word_len + 1, word_len):
word = s[right:right + word_len]
if word not in need:
window.clear()
used = 0
left = right + word_len
continue
window[word] += 1
used += 1
while window[word] > need[word]:
left_word = s[left:left + word_len]
window[left_word] -= 1
used -= 1
left += word_len
if used == word_count:
ans.append(left)
left_word = s[left:left + word_len]
window[left_word] -= 1
used -= 1
left += word_len
return ansCode Explanation
First we compute the sizes:
word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_countA valid substring must have length total_len.
If this is longer than s, no answer exists.
if total_len > len(s):
return []The counter need stores the required number of times each word must appear.
need = Counter(words)Then we process every alignment.
for offset in range(word_len):For each alignment, left is the start of the current window.
left = offsetThe right pointer moves by whole words.
for right in range(offset, len(s) - word_len + 1, word_len):We read one chunk:
word = s[right:right + word_len]If this chunk is not a required word, the current window is useless.
if word not in need:
window.clear()
used = 0
left = right + word_len
continueOtherwise, add it to the window.
window[word] += 1
used += 1If the window has too many copies of this word, shrink it.
while window[word] > need[word]:
left_word = s[left:left + word_len]
window[left_word] -= 1
used -= 1
left += word_lenWhen the window has exactly the required number of chunks, record the start.
if used == word_count:
ans.append(left)Then remove the leftmost word so the window can continue and find overlapping answers.
left_word = s[left:left + word_len]
window[left_word] -= 1
used -= 1
left += word_lenTesting
def check(s: str, words: list[str], expected: list[int]) -> None:
actual = Solution().findSubstring(s, words)
assert sorted(actual) == sorted(expected), (s, words, actual, expected)
def run_tests():
check("barfoothefoobarman", ["foo", "bar"], [0, 9])
check("wordgoodgoodgoodbestword", ["word", "good", "best", "word"], [])
check("barfoofoobarthefoobarman", ["bar", "foo", "the"], [6, 9, 12])
check("wordgoodgoodgoodbestword", ["word", "good", "best", "good"], [8])
check("aaaaaa", ["aa", "aa"], [0, 1, 2])
check("abc", ["abcd"], [])
check("abababab", ["ab", "ba"], [1, 3])
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"barfoothefoobarman" | Basic two-word case |
"wordgoodgoodgoodbestword" with duplicate requirement missing | No valid window |
"barfoofoobarthefoobarman" | Multiple valid windows |
| Duplicate word required | Counts must match, not just membership |
"aaaaaa" | Overlapping answers |
| Word longer than string | Immediate empty result |
| Offset alignment | Valid starts may occur away from index 0 |