LeetCode 471: Encode String with Shortest Length

Problem Restatement

We are given a non-empty lowercase string s.

We need to encode it so that the encoded string has the shortest possible length.

The encoding rule is:

k[encoded_string]

where encoded_string is repeated exactly k times.

If encoding does not make the string shorter, we should keep the original string. If there are multiple shortest answers, returning any one of them is allowed. The input length is at most 160.

Input and Output

Item	Meaning
Input	A lowercase string `s`
Output	The shortest encoded representation
Encoding form	`k[encoded_string]`
Rule	Use encoding only when it shortens the result
Length limit	`len(s) <= 160`

Function shape:

def encode(s: str) -> str:
    ...

Examples

Example 1:

s = "aaa"

Possible encoding:

"3[a]"

But "3[a]" has length 4, while "aaa" has length 3.

So we keep:

"aaa"

Example 2:

s = "aaaaa"

This can be encoded as:

"5[a]"

Length comparison:

String	Length
`"aaaaa"`	5
`"5[a]"`	4

So the answer can be:

"5[a]"

Example 3:

s = "aaaaaaaaaa"

A valid shortest encoding is:

"10[a]"

Example 4:

s = "aabcaabcd"

This is two copies of "aabc" followed by "d".

A shortest encoding is:

"2[aabc]d"

First Thought: Encode Whole String Only

A simple idea is to check whether the whole string is made by repeating a smaller pattern.

For example:

"abababab" = 4["ab"]

So we can return:

"4[ab]"

But this does not handle cases where only part of the string repeats.

For example:

s = "aabcaabcd"

The whole string is not a repetition of one smaller string.

But the prefix is:

"aabcaabc" = 2["aabc"]

So a good answer is:

"2[aabc]d"

This means we need to consider both:

Encoding a substring as one repeated block.
Splitting a substring into two encoded parts.

Key Insight

This is an interval dynamic programming problem.

Let:

dp[i][j]

be the shortest encoded form of substring:

s[i:j + 1]

For every substring, there are two ways to improve its encoding:

Split it into two smaller substrings.
Encode the whole substring if it is made of repeated copies of a smaller pattern.

We compute shorter substrings first, then build longer substrings from them.

Repeated Pattern Detection

For a substring t, we need to know whether it is made by repeating a smaller pattern.

A useful trick is:

pos = (t + t).find(t, 1)

If pos < len(t), then t has a repeated pattern of length pos.

For example:

t = "abababab"
t + t = "abababababababab"

Searching for t starting at index 1 finds it again at index 2.

So the repeated unit is:

"ab"

and:

"abababab" = 4["ab"]

If no repeated pattern exists, the first repeated occurrence is at index len(t), which is just the second copy in t + t.

Algorithm

Let n = len(s).

Create a 2D table:

dp = [[""] * n for _ in range(n)]

Process substrings by increasing length.

For each substring s[i:j + 1]:

Start with the raw substring as the best answer.
Try every split point k.
If dp[i][k] + dp[k + 1][j] is shorter, update the answer.
Check whether the whole substring is a repetition of a smaller pattern.
If it is, encode it as:

repeat_count[encoded_pattern]

where encoded_pattern is itself taken from dp.

Return:

dp[0][n - 1]

Walkthrough

Use:

s = "aaaaa"

For short substrings like "a", "aa", "aaa", and "aaaa", encoding does not help.

So they stay raw.

For "aaaaa":

t = "aaaaa"

The smallest repeated unit is "a".

The repeat count is 5.

Candidate encoding:

"5[a]"

It has length 4, which is shorter than "aaaaa".

So:

dp[0][4] = "5[a]"

Now use:

s = "aabcaabcd"

The substring:

"aabcaabc"

is two copies of:

"aabc"

So its best encoding becomes:

"2[aabc]"

Then the full string can split into:

"aabcaabc" + "d"

So:

dp[0][8] = "2[aabc]d"

Correctness

For every substring s[i:j + 1], the optimal encoding has one of two forms.

It can be a concatenation of two shorter optimal encodings. The algorithm tries every split point, so it considers every possible concatenation.

Or it can encode the entire substring as repeated copies of a smaller pattern. The repeated-pattern check finds the shortest period when such a period exists, and the algorithm uses the already optimal encoding of that repeated unit inside the brackets.

The dynamic program processes substrings from short to long, so every smaller substring needed by a longer substring has already been solved optimally.

Therefore, each dp[i][j] stores the shortest valid encoding of s[i:j + 1].

The final answer dp[0][n - 1] is the shortest valid encoding of the whole string.

Complexity

Let n = len(s).

Metric	Value	Why
Time	`O(n^4)`	There are `O(n^2)` substrings, `O(n)` splits, and repeated checks/substrings cost up to `O(n)`
Space	`O(n^3)` practical	The table stores `O(n^2)` strings whose lengths can be `O(n)`

Since n <= 160, this interval DP is acceptable.

Implementation

class Solution:
    def encode(self, s: str) -> str:
        n = len(s)
        dp = [[""] * n for _ in range(n)]

        for length in range(1, n + 1):
            for i in range(0, n - length + 1):
                j = i + length - 1
                current = s[i:j + 1]
                dp[i][j] = current

                if length < 5:
                    continue

                for k in range(i, j):
                    candidate = dp[i][k] + dp[k + 1][j]

                    if len(candidate) < len(dp[i][j]):
                        dp[i][j] = candidate

                pos = (current + current).find(current, 1)

                if pos < length:
                    repeat_count = length // pos
                    encoded_pattern = dp[i][i + pos - 1]
                    candidate = str(repeat_count) + "[" + encoded_pattern + "]"

                    if len(candidate) < len(dp[i][j]):
                        dp[i][j] = candidate

        return dp[0][n - 1]

Code Explanation

The table:

dp[i][j]

stores the best encoding for s[i:j + 1].

We process by substring length:

for length in range(1, n + 1):

This guarantees that smaller substrings are already computed.

Each substring starts as itself:

current = s[i:j + 1]
dp[i][j] = current

For strings shorter than 5, encoding cannot help. The shortest useful encoding has length at least four, like "2[a]", and for length below 5 it cannot be shorter than the original.

Then we try every split:

candidate = dp[i][k] + dp[k + 1][j]

If the split gives a shorter string, we keep it.

Next we check whether the whole substring repeats:

pos = (current + current).find(current, 1)

If pos < length, then current consists of repeated copies of current[:pos].

We encode the pattern using its own best DP form:

encoded_pattern = dp[i][i + pos - 1]

Then build:

candidate = str(repeat_count) + "[" + encoded_pattern + "]"

Finally:

return dp[0][n - 1]

returns the shortest encoding of the whole string.

Testing

def run_tests():
    s = Solution()

    assert s.encode("aaa") == "aaa"
    assert s.encode("aaaaa") == "5[a]"
    assert s.encode("aaaaaaaaaa") == "10[a]"
    assert s.encode("aabcaabcd") == "2[aabc]d"

    result = s.encode("abbbabbbcabbbabbbc")
    assert result in {
        "2[2[abbb]c]",
        "2[abbbabbbc]",
    }

    assert s.encode("abcdef") == "abcdef"
    assert s.encode("abababab") == "4[ab]"
    assert s.encode("abcabcabc") == "3[abc]"

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`"aaa"`	Encoding would be longer
`"aaaaa"`	Smallest useful repeat encoding
`"aaaaaaaaaa"`	Multi-digit repeat count
`"aabcaabcd"`	Repeated prefix plus suffix
`"abbbabbbcabbbabbbc"`	Nested or whole-block encoding
`"abcdef"`	No useful repetition
`"abababab"`	Repeated two-character pattern
`"abcabcabc"`	Repeated three-character pattern