A clear explanation of interval dynamic programming for encoding a string into the shortest k[encoded_string] form.
Problem Restatement
We are given a non-empty lowercase string s.
We need to encode it so that the encoded string has the shortest possible length.
The encoding rule is:
k[encoded_string]where encoded_string is repeated exactly k times.
If encoding does not make the string shorter, we should keep the original string. If there are multiple shortest answers, returning any one of them is allowed. The input length is at most 160.
Input and Output
| Item | Meaning |
|---|---|
| Input | A lowercase string s |
| Output | The shortest encoded representation |
| Encoding form | k[encoded_string] |
| Rule | Use encoding only when it shortens the result |
| Length limit | len(s) <= 160 |
Function shape:
def encode(s: str) -> str:
...Examples
Example 1:
s = "aaa"Possible encoding:
"3[a]"But "3[a]" has length 4, while "aaa" has length 3.
So we keep:
"aaa"Example 2:
s = "aaaaa"This can be encoded as:
"5[a]"Length comparison:
| String | Length |
|---|---|
"aaaaa" | 5 |
"5[a]" | 4 |
So the answer can be:
"5[a]"Example 3:
s = "aaaaaaaaaa"A valid shortest encoding is:
"10[a]"Example 4:
s = "aabcaabcd"This is two copies of "aabc" followed by "d".
A shortest encoding is:
"2[aabc]d"First Thought: Encode Whole String Only
A simple idea is to check whether the whole string is made by repeating a smaller pattern.
For example:
"abababab" = 4["ab"]So we can return:
"4[ab]"But this does not handle cases where only part of the string repeats.
For example:
s = "aabcaabcd"The whole string is not a repetition of one smaller string.
But the prefix is:
"aabcaabc" = 2["aabc"]So a good answer is:
"2[aabc]d"This means we need to consider both:
- Encoding a substring as one repeated block.
- Splitting a substring into two encoded parts.
Key Insight
This is an interval dynamic programming problem.
Let:
dp[i][j]be the shortest encoded form of substring:
s[i:j + 1]For every substring, there are two ways to improve its encoding:
- Split it into two smaller substrings.
- Encode the whole substring if it is made of repeated copies of a smaller pattern.
We compute shorter substrings first, then build longer substrings from them.
Repeated Pattern Detection
For a substring t, we need to know whether it is made by repeating a smaller pattern.
A useful trick is:
pos = (t + t).find(t, 1)If pos < len(t), then t has a repeated pattern of length pos.
For example:
t = "abababab"
t + t = "abababababababab"Searching for t starting at index 1 finds it again at index 2.
So the repeated unit is:
"ab"and:
"abababab" = 4["ab"]If no repeated pattern exists, the first repeated occurrence is at index len(t), which is just the second copy in t + t.
Algorithm
Let n = len(s).
Create a 2D table:
dp = [[""] * n for _ in range(n)]Process substrings by increasing length.
For each substring s[i:j + 1]:
- Start with the raw substring as the best answer.
- Try every split point
k. - If
dp[i][k] + dp[k + 1][j]is shorter, update the answer. - Check whether the whole substring is a repetition of a smaller pattern.
- If it is, encode it as:
repeat_count[encoded_pattern]where encoded_pattern is itself taken from dp.
Return:
dp[0][n - 1]Walkthrough
Use:
s = "aaaaa"For short substrings like "a", "aa", "aaa", and "aaaa", encoding does not help.
So they stay raw.
For "aaaaa":
t = "aaaaa"The smallest repeated unit is "a".
The repeat count is 5.
Candidate encoding:
"5[a]"It has length 4, which is shorter than "aaaaa".
So:
dp[0][4] = "5[a]"Now use:
s = "aabcaabcd"The substring:
"aabcaabc"is two copies of:
"aabc"So its best encoding becomes:
"2[aabc]"Then the full string can split into:
"aabcaabc" + "d"So:
dp[0][8] = "2[aabc]d"Correctness
For every substring s[i:j + 1], the optimal encoding has one of two forms.
It can be a concatenation of two shorter optimal encodings. The algorithm tries every split point, so it considers every possible concatenation.
Or it can encode the entire substring as repeated copies of a smaller pattern. The repeated-pattern check finds the shortest period when such a period exists, and the algorithm uses the already optimal encoding of that repeated unit inside the brackets.
The dynamic program processes substrings from short to long, so every smaller substring needed by a longer substring has already been solved optimally.
Therefore, each dp[i][j] stores the shortest valid encoding of s[i:j + 1].
The final answer dp[0][n - 1] is the shortest valid encoding of the whole string.
Complexity
Let n = len(s).
| Metric | Value | Why |
|---|---|---|
| Time | O(n^4) | There are O(n^2) substrings, O(n) splits, and repeated checks/substrings cost up to O(n) |
| Space | O(n^3) practical | The table stores O(n^2) strings whose lengths can be O(n) |
Since n <= 160, this interval DP is acceptable.
Implementation
class Solution:
def encode(self, s: str) -> str:
n = len(s)
dp = [[""] * n for _ in range(n)]
for length in range(1, n + 1):
for i in range(0, n - length + 1):
j = i + length - 1
current = s[i:j + 1]
dp[i][j] = current
if length < 5:
continue
for k in range(i, j):
candidate = dp[i][k] + dp[k + 1][j]
if len(candidate) < len(dp[i][j]):
dp[i][j] = candidate
pos = (current + current).find(current, 1)
if pos < length:
repeat_count = length // pos
encoded_pattern = dp[i][i + pos - 1]
candidate = str(repeat_count) + "[" + encoded_pattern + "]"
if len(candidate) < len(dp[i][j]):
dp[i][j] = candidate
return dp[0][n - 1]Code Explanation
The table:
dp[i][j]stores the best encoding for s[i:j + 1].
We process by substring length:
for length in range(1, n + 1):This guarantees that smaller substrings are already computed.
Each substring starts as itself:
current = s[i:j + 1]
dp[i][j] = currentFor strings shorter than 5, encoding cannot help. The shortest useful encoding has length at least four, like "2[a]", and for length below 5 it cannot be shorter than the original.
Then we try every split:
candidate = dp[i][k] + dp[k + 1][j]If the split gives a shorter string, we keep it.
Next we check whether the whole substring repeats:
pos = (current + current).find(current, 1)If pos < length, then current consists of repeated copies of current[:pos].
We encode the pattern using its own best DP form:
encoded_pattern = dp[i][i + pos - 1]Then build:
candidate = str(repeat_count) + "[" + encoded_pattern + "]"Finally:
return dp[0][n - 1]returns the shortest encoding of the whole string.
Testing
def run_tests():
s = Solution()
assert s.encode("aaa") == "aaa"
assert s.encode("aaaaa") == "5[a]"
assert s.encode("aaaaaaaaaa") == "10[a]"
assert s.encode("aabcaabcd") == "2[aabc]d"
result = s.encode("abbbabbbcabbbabbbc")
assert result in {
"2[2[abbb]c]",
"2[abbbabbbc]",
}
assert s.encode("abcdef") == "abcdef"
assert s.encode("abababab") == "4[ab]"
assert s.encode("abcabcabc") == "3[abc]"
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"aaa" | Encoding would be longer |
"aaaaa" | Smallest useful repeat encoding |
"aaaaaaaaaa" | Multi-digit repeat count |
"aabcaabcd" | Repeated prefix plus suffix |
"abbbabbbcabbbabbbc" | Nested or whole-block encoding |
"abcdef" | No useful repetition |
"abababab" | Repeated two-character pattern |
"abcabcabc" | Repeated three-character pattern |