LeetCode 472: Concatenated Words

Problem Restatement

We are given an array of unique strings:

words

We need to return all words in the array that can be formed by concatenating at least two shorter words from the same array.

The shorter words do not need to be distinct. The same word may be used more than once. The official statement defines a concatenated word as a word comprised entirely of at least two shorter words from the given array.

For example:

words = ["cat", "cats", "dog", "catsdog"]

The word:

"catsdog"

can be formed as:

"cats" + "dog"

So it is a concatenated word.

Input and Output

Item	Meaning
Input	`words`, a list of unique strings
Output	All concatenated words in any order
Concatenated word	A word made from at least two shorter words
Word reuse	Allowed
Important rule	A word cannot count as concatenated by using only itself

Function shape:

def findAllConcatenatedWordsInADict(words: list[str]) -> list[str]:
    ...

Examples

Example 1:

words = [
    "cat",
    "cats",
    "catsdogcats",
    "dog",
    "dogcatsdog",
    "hippopotamuses",
    "rat",
    "ratcatdogcat",
]

Concatenated words:

"catsdogcats" = "cats" + "dog" + "cats"
"dogcatsdog" = "dog" + "cats" + "dog"
"ratcatdogcat" = "rat" + "cat" + "dog" + "cat"

Answer:

["catsdogcats", "dogcatsdog", "ratcatdogcat"]

Example 2:

words = ["cat", "dog", "catdog"]

The word "catdog" can be formed as:

"cat" + "dog"

Answer:

["catdog"]

Example 3:

words = ["cat", "dog", "bird"]

No word can be formed from at least two shorter words.

Answer:

[]

First Thought: Try Every Split Recursively

For each word, try every split:

prefix + suffix

If the prefix is in the dictionary, recursively check whether the suffix can also be built.

For example:

word = "catsdogcats"

Try:

"c" + "atsdogcats"
"ca" + "tsdogcats"
"cat" + "sdogcats"
"cats" + "dogcats"

When we see "cats" in the dictionary, we recursively check "dogcats".

This is the right idea, but plain recursion can repeat the same suffix many times.

Problem With Plain Recursion

The same substring may be checked repeatedly.

For example:

word = "aaaaaaaaaaaaaaaa"

with dictionary words:

"a", "aa", "aaa", "aaaa"

There are many ways to split the same suffix.

Without dynamic programming, recursion may explore an exponential number of split sequences.

We need to cache or use iterative DP.

Key Insight

For one word, this is the same structure as the Word Break problem.

Let:

dp[i]

mean:

word[:i] can be formed from dictionary words

Initialize:

dp[0] = True

Then for each ending position i, check whether there is a previous position j such that:

dp[j] is True
word[j:i] is in the dictionary

If yes:

dp[i] = True

To avoid letting a word use itself as one piece, process words from shortest to longest.

When checking a word, the dictionary contains only shorter words already seen.

That automatically enforces the rule that a concatenated word must be made from shorter words.

Algorithm

Sort words by length.

Create an empty set:

word_set = set()

Create an empty answer list.

For each word in sorted order:

If word is empty, skip it.
Check whether word can be formed from words currently in word_set.
If yes, append it to the answer.
Add word to word_set.

The helper can_form(word) uses Word Break DP:

dp[0] = True.
For every i from 1 to len(word):
1. Try every j from 0 to i - 1.
2. If dp[j] and word[j:i] in word_set, set dp[i] = True.
Return dp[len(word)].

Walkthrough

Use:

words = ["cat", "cats", "dog", "catsdog"]

Sort by length:

["cat", "dog", "cats", "catsdog"]

Start:

word_set = set()
answer = []

Check "cat".

No shorter words exist, so it cannot be formed.

Add "cat".

Check "dog".

It also cannot be formed.

Add "dog".

Check "cats".

The prefix "cat" exists, but suffix "s" does not.

So "cats" cannot be formed.

Add "cats".

Check "catsdog".

DP finds:

"cats" + "dog"

So append "catsdog".

Final answer:

["catsdog"]

Correctness

Words are processed from shortest to longest.

When checking a word, word_set contains only shorter words. Therefore, if the helper returns True, the word is formed entirely from shorter words in the original array.

The DP helper is correct because dp[i] is set to True exactly when some valid earlier prefix word[:j] can be formed and the remaining segment word[j:i] is a dictionary word.

By induction over i, dp[i] correctly describes whether word[:i] can be formed from the available shorter words.

Thus, dp[len(word)] is True exactly when the whole word can be formed by concatenating available shorter words.

Since the helper is run for every word, the algorithm returns exactly all concatenated words.

Complexity

Let:

n = len(words)
L = maximum word length

Metric	Value	Why
Time	`O(n * L^3)` in Python slicing form	Each word tries `O(L^2)` substrings, and slicing can cost `O(L)`
Space	`O(n * L + L)`	Store words plus one DP array

In practice this passes because word lengths are moderate.

The trie version can avoid substring slicing cost, but the hash-set DP version is simpler and usually preferred for clarity.

Implementation

class Solution:
    def findAllConcatenatedWordsInADict(self, words: list[str]) -> list[str]:
        words.sort(key=len)

        word_set = set()
        answer = []

        def can_form(word: str) -> bool:
            if not word_set:
                return False

            n = len(word)
            dp = [False] * (n + 1)
            dp[0] = True

            for i in range(1, n + 1):
                for j in range(i):
                    if not dp[j]:
                        continue

                    if word[j:i] in word_set:
                        dp[i] = True
                        break

            return dp[n]

        for word in words:
            if word == "":
                continue

            if can_form(word):
                answer.append(word)

            word_set.add(word)

        return answer

Code Explanation

We sort by length:

words.sort(key=len)

This ensures that when we check a word, only shorter words have already been added to word_set.

The set:

word_set = set()

allows fast lookup of dictionary words.

The DP array:

dp = [False] * (n + 1)
dp[0] = True

means the empty prefix can always be formed.

For each end index:

for i in range(1, n + 1):

we try all split positions:

for j in range(i):

If the prefix is formable and the suffix piece is a known shorter word:

if dp[j] and word[j:i] in word_set:

then the prefix ending at i is formable:

dp[i] = True

After checking the word, we add it to the set:

word_set.add(word)

so longer words may use it later.

Alternative: DFS With Memoization

This version checks a word recursively.

class Solution:
    def findAllConcatenatedWordsInADict(self, words: list[str]) -> list[str]:
        word_set = set(words)
        answer = []

        def can_form(word: str, start: int, count: int, memo: dict[tuple[int, int], bool]) -> bool:
            if start == len(word):
                return count >= 2

            key = (start, count)
            if key in memo:
                return memo[key]

            for end in range(start + 1, len(word) + 1):
                part = word[start:end]

                if part not in word_set:
                    continue

                if part == word and count == 0:
                    continue

                if can_form(word, end, count + 1, memo):
                    memo[key] = True
                    return True

            memo[key] = False
            return False

        for word in words:
            if word and can_form(word, 0, 0, {}):
                answer.append(word)

        return answer

The sorted DP version avoids the special case where a word tries to use itself.

Testing

def run_tests():
    s = Solution()

    assert sorted(s.findAllConcatenatedWordsInADict([
        "cat",
        "cats",
        "catsdogcats",
        "dog",
        "dogcatsdog",
        "hippopotamuses",
        "rat",
        "ratcatdogcat",
    ])) == sorted([
        "catsdogcats",
        "dogcatsdog",
        "ratcatdogcat",
    ])

    assert s.findAllConcatenatedWordsInADict([
        "cat",
        "dog",
        "catdog",
    ]) == ["catdog"]

    assert s.findAllConcatenatedWordsInADict([
        "cat",
        "dog",
        "bird",
    ]) == []

    assert sorted(s.findAllConcatenatedWordsInADict([
        "a",
        "aa",
        "aaa",
        "aaaa",
    ])) == sorted([
        "aa",
        "aaa",
        "aaaa",
    ])

    assert s.findAllConcatenatedWordsInADict([
        "constructor",
    ]) == []

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
Standard example	Checks multiple concatenated words
`"catdog"`	Simple two-word concatenation
No concatenated words	Should return empty list
Repeated same word	Same shorter word may be reused
Single word only	A word cannot be formed by itself