# LeetCode 820: Short Encoding of Words

## Problem Restatement

We are given an array of words.

A valid encoding uses:

| Item | Meaning |
|---|---|
| `s` | A reference string ending with `#` |
| `indices` | Starting positions inside `s` |

For each word, if we start reading from its index in `s` and stop before the next `#`, we must recover that word.

For example:

```python
s = "time#bell#"
indices = [0, 2, 5]
```

This encodes:

```python
"time" from index 0
"me" from index 2
"bell" from index 5
```

The word `"me"` does not need its own `"me#"` because it is already a suffix of `"time"`.

We need to return the length of the shortest possible reference string. The official statement defines a valid encoding this way and asks for the minimum length of `s`.

## Input and Output

| Item | Meaning |
|---|---|
| Input | An array `words` |
| Output | Length of the shortest valid reference string |
| Separator | Each encoded word ends with `#` |
| Main saving rule | A word that is a suffix of another word does not need separate storage |

## Examples

Example 1:

```python
words = ["time", "me", "bell"]
```

The shortest encoding is:

```python
"time#bell#"
```

Its length is:

```python
10
```

So the answer is:

```python
10
```

Example 2:

```python
words = ["t"]
```

The shortest encoding is:

```python
"t#"
```

The answer is:

```python
2
```

## First Thought: Encode Every Word

The simplest valid encoding is to write every word followed by `#`.

For example:

```python
words = ["time", "me", "bell"]
```

could become:

```python
"time#me#bell#"
```

This has length `13`.

But `"me"` is already inside `"time#"` as a suffix ending at the same `#`.

So writing `"me#"` separately is unnecessary.

## Key Insight

Only suffix relationships matter.

If word `a` is a suffix of word `b`, then `a` can be represented by starting inside `b` in the encoded string.

For example:

```python
"time#"
```

can encode both:

```python
"time"
"me"
```

But if a word appears only as a middle substring, it does not help.

For example:

```python
"im"
```

appears inside `"time"`, but it does not end at the same `#`, so it cannot be recovered from `"time#"`.

Therefore, the final encoding only needs words that are not suffixes of any other word.

## Algorithm

Use a set to remove duplicates:

```python
unique = set(words)
```

Then for each word, remove all of its non-empty suffixes from the set.

For a word like:

```python
"time"
```

the removable suffixes are:

```python
"ime"
"me"
"e"
```

The full word itself should stay unless it is removed as a suffix of some longer word.

After processing all words, every remaining word must appear explicitly in the encoding.

The answer is:

```python
sum(len(word) + 1 for word in unique)
```

The `+ 1` counts the trailing `#`.

## Correctness

If a word is a suffix of another word, the algorithm removes it from the set. Such a word does not need its own encoded segment because it can be recovered by starting inside the longer word’s segment and reading until the same `#`.

If a word remains in the set, then it was not found as a suffix of any other word. No other encoded word can represent it, because the only way to recover a word from inside another encoded segment is for it to be a suffix of that segment. Therefore, it must be written explicitly.

The algorithm removes exactly the words that can be represented by longer words and keeps exactly the words that must be written. Summing `len(word) + 1` over the remaining words gives the minimum reference string length.

## Complexity

Let `n = len(words)` and let `L` be the maximum word length.

| Metric | Value | Why |
|---|---|---|
| Time | `O(n * L^2)` | For each word, we create up to `L - 1` suffix strings |
| Space | `O(n * L)` | The set stores unique words |

In this problem, `L <= 7`, so this approach is easily fast enough.

## Implementation

```python
class Solution:
    def minimumLengthEncoding(self, words: list[str]) -> int:
        unique = set(words)

        for word in words:
            for i in range(1, len(word)):
                unique.discard(word[i:])

        answer = 0

        for word in unique:
            answer += len(word) + 1

        return answer
```

## Code Explanation

We start by removing duplicate words:

```python
unique = set(words)
```

Duplicate words only need to be encoded once.

Then we scan every word and remove its suffixes:

```python
for word in words:
    for i in range(1, len(word)):
        unique.discard(word[i:])
```

The loop starts at `1` so we do not remove the whole word from itself.

The method `discard` is useful because it does nothing if the suffix is not in the set.

After suffix removal, the set contains only words that need their own encoded segment.

Each such segment contributes:

```python
len(word) + 1
```

because of the final `#`.

## Testing

```python
def run_tests():
    s = Solution()

    assert s.minimumLengthEncoding(["time", "me", "bell"]) == 10
    assert s.minimumLengthEncoding(["t"]) == 2
    assert s.minimumLengthEncoding(["time", "time", "me"]) == 5
    assert s.minimumLengthEncoding(["me", "time"]) == 5
    assert s.minimumLengthEncoding(["time", "atime", "btime"]) == 12
    assert s.minimumLengthEncoding(["a", "ba", "cba"]) == 4

    print("all tests passed")

run_tests()
```

| Test | Why |
|---|---|
| `["time", "me", "bell"]` | Official suffix-sharing example |
| `["t"]` | Single word |
| Duplicate words | Encodes duplicates once |
| Suffix appears before full word | Order should not matter |
| Several words sharing same suffix | Keeps only non-suffix words |
| Chain of suffixes | Only longest word needs encoding |