# LeetCode 471: Encode String with Shortest Length

## Problem Restatement

We are given a non-empty lowercase string `s`.

We need to encode it so that the encoded string has the shortest possible length.

The encoding rule is:

```python
k[encoded_string]
```

where `encoded_string` is repeated exactly `k` times.

If encoding does not make the string shorter, we should keep the original string. If there are multiple shortest answers, returning any one of them is allowed. The input length is at most `160`.

## Input and Output

| Item | Meaning |
|---|---|
| Input | A lowercase string `s` |
| Output | The shortest encoded representation |
| Encoding form | `k[encoded_string]` |
| Rule | Use encoding only when it shortens the result |
| Length limit | `len(s) <= 160` |

Function shape:

```python
def encode(s: str) -> str:
    ...
```

## Examples

Example 1:

```python
s = "aaa"
```

Possible encoding:

```python
"3[a]"
```

But `"3[a]"` has length `4`, while `"aaa"` has length `3`.

So we keep:

```python
"aaa"
```

Example 2:

```python
s = "aaaaa"
```

This can be encoded as:

```python
"5[a]"
```

Length comparison:

| String | Length |
|---|---:|
| `"aaaaa"` | 5 |
| `"5[a]"` | 4 |

So the answer can be:

```python
"5[a]"
```

Example 3:

```python
s = "aaaaaaaaaa"
```

A valid shortest encoding is:

```python
"10[a]"
```

Example 4:

```python
s = "aabcaabcd"
```

This is two copies of `"aabc"` followed by `"d"`.

A shortest encoding is:

```python
"2[aabc]d"
```

## First Thought: Encode Whole String Only

A simple idea is to check whether the whole string is made by repeating a smaller pattern.

For example:

```python
"abababab" = 4["ab"]
```

So we can return:

```python
"4[ab]"
```

But this does not handle cases where only part of the string repeats.

For example:

```python
s = "aabcaabcd"
```

The whole string is not a repetition of one smaller string.

But the prefix is:

```python
"aabcaabc" = 2["aabc"]
```

So a good answer is:

```python
"2[aabc]d"
```

This means we need to consider both:

1. Encoding a substring as one repeated block.
2. Splitting a substring into two encoded parts.

## Key Insight

This is an interval dynamic programming problem.

Let:

```python
dp[i][j]
```

be the shortest encoded form of substring:

```python
s[i:j + 1]
```

For every substring, there are two ways to improve its encoding:

1. Split it into two smaller substrings.
2. Encode the whole substring if it is made of repeated copies of a smaller pattern.

We compute shorter substrings first, then build longer substrings from them.

## Repeated Pattern Detection

For a substring `t`, we need to know whether it is made by repeating a smaller pattern.

A useful trick is:

```python
pos = (t + t).find(t, 1)
```

If `pos < len(t)`, then `t` has a repeated pattern of length `pos`.

For example:

```python
t = "abababab"
t + t = "abababababababab"
```

Searching for `t` starting at index `1` finds it again at index `2`.

So the repeated unit is:

```python
"ab"
```

and:

```python
"abababab" = 4["ab"]
```

If no repeated pattern exists, the first repeated occurrence is at index `len(t)`, which is just the second copy in `t + t`.

## Algorithm

Let `n = len(s)`.

Create a 2D table:

```python
dp = [[""] * n for _ in range(n)]
```

Process substrings by increasing length.

For each substring `s[i:j + 1]`:

1. Start with the raw substring as the best answer.
2. Try every split point `k`.
3. If `dp[i][k] + dp[k + 1][j]` is shorter, update the answer.
4. Check whether the whole substring is a repetition of a smaller pattern.
5. If it is, encode it as:

```python
repeat_count[encoded_pattern]
```

where `encoded_pattern` is itself taken from `dp`.

Return:

```python
dp[0][n - 1]
```

## Walkthrough

Use:

```python
s = "aaaaa"
```

For short substrings like `"a"`, `"aa"`, `"aaa"`, and `"aaaa"`, encoding does not help.

So they stay raw.

For `"aaaaa"`:

```python
t = "aaaaa"
```

The smallest repeated unit is `"a"`.

The repeat count is `5`.

Candidate encoding:

```python
"5[a]"
```

It has length `4`, which is shorter than `"aaaaa"`.

So:

```python
dp[0][4] = "5[a]"
```

Now use:

```python
s = "aabcaabcd"
```

The substring:

```python
"aabcaabc"
```

is two copies of:

```python
"aabc"
```

So its best encoding becomes:

```python
"2[aabc]"
```

Then the full string can split into:

```python
"aabcaabc" + "d"
```

So:

```python
dp[0][8] = "2[aabc]d"
```

## Correctness

For every substring `s[i:j + 1]`, the optimal encoding has one of two forms.

It can be a concatenation of two shorter optimal encodings. The algorithm tries every split point, so it considers every possible concatenation.

Or it can encode the entire substring as repeated copies of a smaller pattern. The repeated-pattern check finds the shortest period when such a period exists, and the algorithm uses the already optimal encoding of that repeated unit inside the brackets.

The dynamic program processes substrings from short to long, so every smaller substring needed by a longer substring has already been solved optimally.

Therefore, each `dp[i][j]` stores the shortest valid encoding of `s[i:j + 1]`.

The final answer `dp[0][n - 1]` is the shortest valid encoding of the whole string.

## Complexity

Let `n = len(s)`.

| Metric | Value | Why |
|---|---:|---|
| Time | `O(n^4)` | There are `O(n^2)` substrings, `O(n)` splits, and repeated checks/substrings cost up to `O(n)` |
| Space | `O(n^3)` practical | The table stores `O(n^2)` strings whose lengths can be `O(n)` |

Since `n <= 160`, this interval DP is acceptable.

## Implementation

```python
class Solution:
    def encode(self, s: str) -> str:
        n = len(s)
        dp = [[""] * n for _ in range(n)]

        for length in range(1, n + 1):
            for i in range(0, n - length + 1):
                j = i + length - 1
                current = s[i:j + 1]
                dp[i][j] = current

                if length < 5:
                    continue

                for k in range(i, j):
                    candidate = dp[i][k] + dp[k + 1][j]

                    if len(candidate) < len(dp[i][j]):
                        dp[i][j] = candidate

                pos = (current + current).find(current, 1)

                if pos < length:
                    repeat_count = length // pos
                    encoded_pattern = dp[i][i + pos - 1]
                    candidate = str(repeat_count) + "[" + encoded_pattern + "]"

                    if len(candidate) < len(dp[i][j]):
                        dp[i][j] = candidate

        return dp[0][n - 1]
```

## Code Explanation

The table:

```python
dp[i][j]
```

stores the best encoding for `s[i:j + 1]`.

We process by substring length:

```python
for length in range(1, n + 1):
```

This guarantees that smaller substrings are already computed.

Each substring starts as itself:

```python
current = s[i:j + 1]
dp[i][j] = current
```

For strings shorter than `5`, encoding cannot help. The shortest useful encoding has length at least four, like `"2[a]"`, and for length below `5` it cannot be shorter than the original.

Then we try every split:

```python
candidate = dp[i][k] + dp[k + 1][j]
```

If the split gives a shorter string, we keep it.

Next we check whether the whole substring repeats:

```python
pos = (current + current).find(current, 1)
```

If `pos < length`, then `current` consists of repeated copies of `current[:pos]`.

We encode the pattern using its own best DP form:

```python
encoded_pattern = dp[i][i + pos - 1]
```

Then build:

```python
candidate = str(repeat_count) + "[" + encoded_pattern + "]"
```

Finally:

```python
return dp[0][n - 1]
```

returns the shortest encoding of the whole string.

## Testing

```python
def run_tests():
    s = Solution()

    assert s.encode("aaa") == "aaa"
    assert s.encode("aaaaa") == "5[a]"
    assert s.encode("aaaaaaaaaa") == "10[a]"
    assert s.encode("aabcaabcd") == "2[aabc]d"

    result = s.encode("abbbabbbcabbbabbbc")
    assert result in {
        "2[2[abbb]c]",
        "2[abbbabbbc]",
    }

    assert s.encode("abcdef") == "abcdef"
    assert s.encode("abababab") == "4[ab]"
    assert s.encode("abcabcabc") == "3[abc]"

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| `"aaa"` | Encoding would be longer |
| `"aaaaa"` | Smallest useful repeat encoding |
| `"aaaaaaaaaa"` | Multi-digit repeat count |
| `"aabcaabcd"` | Repeated prefix plus suffix |
| `"abbbabbbcabbbabbbc"` | Nested or whole-block encoding |
| `"abcdef"` | No useful repetition |
| `"abababab"` | Repeated two-character pattern |
| `"abcabcabc"` | Repeated three-character pattern |