LeetCode 140: Word Break II

Problem Restatement

We are given:

Name	Meaning
`s`	A string without spaces
`wordDict`	A list of valid dictionary words

We need to insert spaces into s so that every piece is a valid dictionary word.

Return all possible valid sentences.

The same dictionary word may be reused multiple times. The answer may be returned in any order. LeetCode gives constraints including 1 <= s.length <= 20, 1 <= wordDict.length <= 1000, unique dictionary words, and generated inputs where the answer length does not exceed 10^5.

Examples

Example 1:

s = "catsanddog"
wordDict = ["cat", "cats", "and", "sand", "dog"]

Valid sentences:

[
    "cats and dog",
    "cat sand dog",
]

Both use only dictionary words.

Example 2:

s = "pineapplepenapple"
wordDict = ["apple", "pen", "applepen", "pine", "pineapple"]

Valid sentences:

[
    "pine apple pen apple",
    "pineapple pen apple",
    "pine applepen apple",
]

The word "apple" is reused.

Example 3:

s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]

There is no valid way to segment the whole string.

Output:

[]

Input and Output

Item	Meaning
Input	`s: str`, `wordDict: list[str]`
Output	`list[str]`
Rule	Every word in each sentence must be in `wordDict`
Reuse	Dictionary words may be reused
Order	Any order is accepted

Function shape:

class Solution:
    def wordBreak(self, s: str, wordDict: list[str]) -> list[str]:
        ...

Relation to LeetCode 139

LeetCode 139 asks whether at least one segmentation exists.

LeetCode 140 asks for all valid segmentations.

So a boolean DP alone is not enough. We need to construct actual sentences.

Still, the core idea is similar:

At each index, choose a dictionary word that matches the current suffix, then solve the remaining suffix.

First Thought: Plain Backtracking

For:

s = "catsanddog"

At index 0, possible starting words are:

"cat"
"cats"

If we choose "cat", the remaining suffix is:

"sanddog"

If we choose "cats", the remaining suffix is:

"anddog"

So the search tree branches.

Plain backtracking works, but it recomputes the same suffix many times.

For example, different earlier choices may lead to the same remaining substring. We should cache the result for each start index.

Key Insight

Use DFS with memoization.

Let:

dfs(start)

return all valid sentences that can be formed from:

s[start:]

For example:

dfs(7)

might return:

["dog"]

Then if the current word is "sand", we combine:

"sand" + " " + "dog"

to produce:

"sand dog"

Memoization stores:

memo[start] = list of sentences from s[start:]

So each suffix is solved once.

Base Case

When start == len(s), we have consumed the entire string.

There is one valid completion: the empty sentence.

Return:

[""]

This may look strange, but it makes combination simple.

If the current word is "dog" and the suffix after it is empty, we combine:

word = "dog"
tail = ""

The result should be:

"dog"

not:

"dog "

So the combination rule is:

sentence = word if tail == "" else word + " " + tail

Algorithm

Convert wordDict into a set for fast lookup.
Compute the maximum word length to avoid checking impossible long substrings.
Define dfs(start).
If start == len(s), return [""].
If start is already memoized, return memo[start].
Try every end from start + 1 to min(n, start + max_len).
If s[start:end] is a dictionary word:
- recursively get all sentences from dfs(end)
- prepend the current word to each tail
Store and return the result.

Correctness

For any index start, dfs(start) tries every possible first word s[start:end] that starts at start and belongs to the dictionary.

If a valid sentence exists from s[start:], it has some first word. The algorithm tries exactly that first word, then recursively constructs every valid sentence from the remaining suffix s[end:].

If the recursive suffix returns a valid tail sentence, prepending the current dictionary word creates a valid sentence for s[start:].

The base case returns [""] only when the whole string has been consumed, which represents one complete segmentation.

Thus, dfs(start) returns exactly all valid sentences for s[start:].

Therefore, dfs(0) returns exactly all valid sentences for the full string s.

Complexity

Let:

Symbol	Meaning
`n`	Length of `s`
`L`	Maximum word length in `wordDict`
`A`	Total size of the output strings

The number of valid sentences can be exponential in n. The output itself can be large, so any correct algorithm must spend time proportional to the output size.

Metric	Value	Why
Time	`O(n * L^2 + A)`	Each start checks up to `L` substrings, slicing costs up to `L`, then output construction dominates
Space	`O(n + A)`	Memoization, recursion stack, and returned sentences

A simpler upper bound often used is exponential, because the number of possible segmentations can be exponential.

Implementation

class Solution:
    def wordBreak(self, s: str, wordDict: list[str]) -> list[str]:
        word_set = set(wordDict)
        max_len = max(len(word) for word in wordDict)
        n = len(s)
        memo = {}

        def dfs(start: int) -> list[str]:
            if start == n:
                return [""]

            if start in memo:
                return memo[start]

            sentences = []
            limit = min(n, start + max_len)

            for end in range(start + 1, limit + 1):
                word = s[start:end]

                if word not in word_set:
                    continue

                for tail in dfs(end):
                    if tail == "":
                        sentences.append(word)
                    else:
                        sentences.append(word + " " + tail)

            memo[start] = sentences
            return sentences

        return dfs(0)

Code Explanation

We convert the dictionary to a set:

word_set = set(wordDict)

This makes word lookup fast.

We also compute:

max_len = max(len(word) for word in wordDict)

No valid word can be longer than that, so each DFS call only checks a bounded range of endings.

The memo table stores already computed suffix answers:

memo = {}

The helper returns all sentences from s[start:]:

def dfs(start: int) -> list[str]:

If the whole string has been consumed:

if start == n:
    return [""]

This means the current path formed a complete sentence.

If this suffix was already solved:

if start in memo:
    return memo[start]

we reuse the result.

For each possible ending:

for end in range(start + 1, limit + 1):

we take the current candidate word:

word = s[start:end]

If it is not in the dictionary, skip it:

if word not in word_set:
    continue

Otherwise, solve the remaining suffix:

for tail in dfs(end):

Then combine:

if tail == "":
    sentences.append(word)
else:
    sentences.append(word + " " + tail)

Finally, cache and return:

memo[start] = sentences
return sentences

Optional Pruning with Word Break I DP

Memoized DFS is enough for the constraints, but we can add a boolean DP to avoid exploring suffixes that cannot be segmented.

Let:

can_break[i]

mean:

s[i:] can be segmented into dictionary words

Then DFS only enters a suffix if can_break[end] is true.

This can reduce wasted work on impossible branches.

class Solution:
    def wordBreak(self, s: str, wordDict: list[str]) -> list[str]:
        word_set = set(wordDict)
        max_len = max(len(word) for word in wordDict)
        n = len(s)

        can_break = [False] * (n + 1)
        can_break[n] = True

        for start in range(n - 1, -1, -1):
            limit = min(n, start + max_len)

            for end in range(start + 1, limit + 1):
                if s[start:end] in word_set and can_break[end]:
                    can_break[start] = True
                    break

        memo = {}

        def dfs(start: int) -> list[str]:
            if start == n:
                return [""]

            if start in memo:
                return memo[start]

            sentences = []
            limit = min(n, start + max_len)

            for end in range(start + 1, limit + 1):
                word = s[start:end]

                if word not in word_set:
                    continue

                if not can_break[end]:
                    continue

                for tail in dfs(end):
                    if tail == "":
                        sentences.append(word)
                    else:
                        sentences.append(word + " " + tail)

            memo[start] = sentences
            return sentences

        return dfs(0)

This version keeps the same output but avoids searching suffixes known to be impossible.

Testing

def normalize(ans: list[str]) -> list[str]:
    return sorted(ans)

def run_tests():
    s = Solution()

    assert normalize(
        s.wordBreak(
            "catsanddog",
            ["cat", "cats", "and", "sand", "dog"],
        )
    ) == normalize([
        "cats and dog",
        "cat sand dog",
    ])

    assert normalize(
        s.wordBreak(
            "pineapplepenapple",
            ["apple", "pen", "applepen", "pine", "pineapple"],
        )
    ) == normalize([
        "pine apple pen apple",
        "pineapple pen apple",
        "pine applepen apple",
    ])

    assert s.wordBreak(
        "catsandog",
        ["cats", "dog", "sand", "and", "cat"],
    ) == []

    assert normalize(
        s.wordBreak(
            "aaaa",
            ["a", "aa", "aaa"],
        )
    ) == normalize([
        "a a a a",
        "a a aa",
        "a aa a",
        "a aaa",
        "aa a a",
        "aa aa",
        "aaa a",
    ])

    assert s.wordBreak(
        "leetcode",
        ["leet", "code"],
    ) == ["leet code"]

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`"catsanddog"`	Standard multiple-answer case
`"pineapplepenapple"`	Reuses words and has overlapping choices
`"catsandog"`	No valid full segmentation
`"aaaa"`	Many valid cuts
`"leetcode"`	Simple one-answer case