# LeetCode 30: Substring with Concatenation of All Words

## Problem Restatement

We are given a string `s` and an array `words`.

Every word in `words` has the same length.

We need to find every starting index in `s` where a substring is made by concatenating all words in `words` exactly once, in any order, with no extra characters between them.

Return the starting indices in any order.

For example, if:

```python
words = ["ab", "cd", "ef"]
```

then valid concatenations include:

```python
"abcdef"
"abefcd"
"cdabef"
"cdefab"
"efabcd"
"efcdab"
```

Each word must be used exactly once. The order can change.

The constraints say `1 <= s.length <= 10^4`, `1 <= words.length <= 5000`, and `1 <= words[i].length <= 30`. All strings contain lowercase English letters. All words have the same length.

## Input and Output

| Item | Meaning |
|---|---|
| Input | A string `s` and a list of strings `words` |
| Output | A list of starting indices |
| Word length | Every word has the same length |
| Match rule | Every word must appear exactly once |
| Order | Any order is allowed |
| Gaps | No gaps between words |

Function shape:

```python
def findSubstring(s: str, words: list[str]) -> list[int]:
    ...
```

## Examples

Example 1:

```python
s = "barfoothefoobarman"
words = ["foo", "bar"]
```

Each word has length `3`.

The total substring length is:

```python
2 * 3 = 6
```

At index `0`, the substring is:

```python
"barfoo"
```

This uses `"bar"` and `"foo"` exactly once.

At index `9`, the substring is:

```python
"foobar"
```

This also uses both words exactly once.

Return:

```python
[0, 9]
```

Example 2:

```python
s = "wordgoodgoodgoodbestword"
words = ["word", "good", "best", "word"]
```

The word `"word"` must appear twice, because it appears twice in `words`.

No substring has exactly this multiset of words.

Return:

```python
[]
```

Example 3:

```python
s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]
```

Valid starting indices are:

```python
[6, 9, 12]
```

At index `6`:

```python
"foobarthe"
```

At index `9`:

```python
"barthefoo"
```

At index `12`:

```python
"thefoobar"
```

Each substring uses `"bar"`, `"foo"`, and `"the"` exactly once.

## First Thought: Check Every Start

The direct approach is to try every starting index.

For each start index, split the next `total_len` characters into word-sized chunks.

Then count those chunks and compare them with the required word counts.

```python
from collections import Counter

class Solution:
    def findSubstring(self, s: str, words: list[str]) -> list[int]:
        word_len = len(words[0])
        word_count = len(words)
        total_len = word_len * word_count
        need = Counter(words)

        ans = []

        for start in range(len(s) - total_len + 1):
            seen = Counter()
            ok = True

            for offset in range(0, total_len, word_len):
                part = s[start + offset:start + offset + word_len]

                if part not in need:
                    ok = False
                    break

                seen[part] += 1

                if seen[part] > need[part]:
                    ok = False
                    break

            if ok:
                ans.append(start)

        return ans
```

This is easy to understand.

For every possible start, we check whether the substring contains the exact multiset of words.

## Problem With Brute Force

The brute force method repeats too much work.

If `s` has length `n`, and there are `m` words, then each start may inspect `m` chunks.

So the time complexity is roughly:

```python
O(n * m * word_len)
```

The slicing cost adds another factor of `word_len`.

This can still pass in some languages with careful implementation, but the intended method uses a sliding window over word-sized chunks.

## Key Insight

All words have the same length.

So a valid substring can be viewed as a sequence of fixed-size blocks.

For example:

```python
s = "barfoofoobarthefoobarman"
words = ["bar", "foo", "the"]
word_len = 3
```

We should read `s` in chunks of length `3`:

```python
bar | foo | foo | bar | the | foo | bar | man
```

A valid window contains exactly `len(words)` chunks, and its chunk counts must match `Counter(words)`.

The tricky part is alignment.

A word can start at index `0`, `1`, or `2` when `word_len = 3`.

So we run the sliding window once for each offset:

```python
0, 1, ..., word_len - 1
```

Each run moves only by `word_len`.

## Algorithm

Let:

```python
word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count
need = Counter(words)
```

If `total_len > len(s)`, return an empty list.

Now process each alignment offset from `0` to `word_len - 1`.

For each offset:

1. Set `left = offset`.
2. Set `window = Counter()`.
3. Set `used = 0`, the number of word chunks currently in the window.
4. Move `right` by `word_len` each time.
5. Read `word = s[right:right + word_len]`.

There are three cases.

Case 1: `word` is not in `need`.

The current window can no longer be valid.

Clear the window and move `left` after this word.

```python
window.clear()
used = 0
left = right + word_len
```

Case 2: `word` is needed.

Add it to the window.

```python
window[word] += 1
used += 1
```

If this word appears too many times, shrink from the left until the count becomes valid again.

```python
while window[word] > need[word]:
    left_word = s[left:left + word_len]
    window[left_word] -= 1
    used -= 1
    left += word_len
```

Case 3: The window has exactly `word_count` chunks.

Then `left` is a valid starting index.

```python
if used == word_count:
    ans.append(left)
```

Then move the left side forward by one word so the window can search for the next answer.

## Correctness

The algorithm processes each alignment separately. This is enough because every valid substring starts at some index whose remainder modulo `word_len` is one of those offsets.

Within one alignment, every step reads exactly one word-sized chunk. The window always starts and ends on chunk boundaries, so every candidate window represents a sequence of complete word chunks.

If a chunk does not belong to `need`, then no valid substring crossing that chunk can exist. Clearing the window is correct because any window containing that chunk would contain a word outside `words`.

If a chunk belongs to `need`, the algorithm adds it to the current window. When its count becomes too large, the algorithm moves `left` forward until the count is valid again. This preserves the rule that no word appears more times than required.

Therefore, whenever `used == word_count`, the window contains exactly `word_count` chunks and no word count exceeds its required count. Since the total number of chunks equals the required number of words, all word counts must match exactly. So `left` is a valid answer.

Every valid answer is found because when the right side reaches the end of that valid block, all its chunks are needed and no count exceeds the required count. The window will have exactly `word_count` chunks, so the algorithm appends its start index.

## Complexity

| Metric | Value | Why |
|---|---|---|
| Time | `O(n * word_len)` | There are `word_len` offsets, and each chunk is added and removed at most once per offset |
| Space | `O(m)` | The counters store at most the distinct words |

More commonly, because `word_len <= 30`, this behaves like linear time in the length of `s`.

Here, `n = len(s)` and `m = len(words)`.

## Implementation

```python
from collections import Counter

class Solution:
    def findSubstring(self, s: str, words: list[str]) -> list[int]:
        word_len = len(words[0])
        word_count = len(words)
        total_len = word_len * word_count

        if total_len > len(s):
            return []

        need = Counter(words)
        ans = []

        for offset in range(word_len):
            left = offset
            window = Counter()
            used = 0

            for right in range(offset, len(s) - word_len + 1, word_len):
                word = s[right:right + word_len]

                if word not in need:
                    window.clear()
                    used = 0
                    left = right + word_len
                    continue

                window[word] += 1
                used += 1

                while window[word] > need[word]:
                    left_word = s[left:left + word_len]
                    window[left_word] -= 1
                    used -= 1
                    left += word_len

                if used == word_count:
                    ans.append(left)

                    left_word = s[left:left + word_len]
                    window[left_word] -= 1
                    used -= 1
                    left += word_len

        return ans
```

## Code Explanation

First we compute the sizes:

```python
word_len = len(words[0])
word_count = len(words)
total_len = word_len * word_count
```

A valid substring must have length `total_len`.

If this is longer than `s`, no answer exists.

```python
if total_len > len(s):
    return []
```

The counter `need` stores the required number of times each word must appear.

```python
need = Counter(words)
```

Then we process every alignment.

```python
for offset in range(word_len):
```

For each alignment, `left` is the start of the current window.

```python
left = offset
```

The `right` pointer moves by whole words.

```python
for right in range(offset, len(s) - word_len + 1, word_len):
```

We read one chunk:

```python
word = s[right:right + word_len]
```

If this chunk is not a required word, the current window is useless.

```python
if word not in need:
    window.clear()
    used = 0
    left = right + word_len
    continue
```

Otherwise, add it to the window.

```python
window[word] += 1
used += 1
```

If the window has too many copies of this word, shrink it.

```python
while window[word] > need[word]:
    left_word = s[left:left + word_len]
    window[left_word] -= 1
    used -= 1
    left += word_len
```

When the window has exactly the required number of chunks, record the start.

```python
if used == word_count:
    ans.append(left)
```

Then remove the leftmost word so the window can continue and find overlapping answers.

```python
left_word = s[left:left + word_len]
window[left_word] -= 1
used -= 1
left += word_len
```

## Testing

```python
def check(s: str, words: list[str], expected: list[int]) -> None:
    actual = Solution().findSubstring(s, words)
    assert sorted(actual) == sorted(expected), (s, words, actual, expected)

def run_tests():
    check("barfoothefoobarman", ["foo", "bar"], [0, 9])
    check("wordgoodgoodgoodbestword", ["word", "good", "best", "word"], [])
    check("barfoofoobarthefoobarman", ["bar", "foo", "the"], [6, 9, 12])

    check("wordgoodgoodgoodbestword", ["word", "good", "best", "good"], [8])
    check("aaaaaa", ["aa", "aa"], [0, 1, 2])
    check("abc", ["abcd"], [])
    check("abababab", ["ab", "ba"], [1, 3])

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| `"barfoothefoobarman"` | Basic two-word case |
| `"wordgoodgoodgoodbestword"` with duplicate requirement missing | No valid window |
| `"barfoofoobarthefoobarman"` | Multiple valid windows |
| Duplicate word required | Counts must match, not just membership |
| `"aaaaaa"` | Overlapping answers |
| Word longer than string | Immediate empty result |
| Offset alignment | Valid starts may occur away from index `0` |