# LeetCode 316: Remove Duplicate Letters

## Problem Restatement

We are given a string `s` containing lowercase English letters.

We need to remove duplicate letters so that every distinct letter appears exactly once.

Among all valid results, return the lexicographically smallest one.

The official constraints are:

| Constraint | Value |
|---|---:|
| `s.length` | `1 <= s.length <= 10^4` |
| Characters | Lowercase English letters |

This problem is also the same as LeetCode 1081, Smallest Subsequence of Distinct Characters. ([github.com](https://github.com/doocs/leetcode/blob/main/solution/0300-0399/0316.Remove%20Duplicate%20Letters/README_EN.md))

## Input and Output

| Item | Meaning |
|---|---|
| Input | A lowercase string `s` |
| Output | Smallest lexicographical string containing each distinct letter once |
| Operation | Remove characters only |
| Important rule | Relative order of kept characters must come from the original string |

Function shape:

```python
def removeDuplicateLetters(s: str) -> str:
    ...
```

## Examples

Example 1:

```python
s = "bcabc"
```

Output:

```python
"abc"
```

We can keep the later `b` and later `c`, giving `"abc"`.

Another possible result is `"bca"`, but `"abc"` is lexicographically smaller.

Example 2:

```python
s = "cbacdcbc"
```

Output:

```python
"acdb"
```

The result must contain:

```text
a, b, c, d
```

exactly once.

The smallest valid subsequence is:

```python
"acdb"
```

We cannot simply sort the letters into `"abcd"`, because `"abcd"` is not a subsequence of `"cbacdcbc"`.

## First Thought: Sort Unique Characters

A tempting idea is:

```python
return "".join(sorted(set(s)))
```

For:

```python
s = "bcabc"
```

this gives:

```python
"abc"
```

That works for this case.

But for:

```python
s = "cbacdcbc"
```

it gives:

```python
"abcd"
```

This is invalid, because the letters must remain in an order possible by deleting characters from the original string.

We can delete characters, but we cannot reorder them.

## Key Insight

We build the answer from left to right using a stack.

The stack stores the current best answer prefix.

When we see a new character `ch`, we want to place it as early as possible if it makes the result smaller.

If the top of the stack is lexicographically larger than `ch`, then replacing that top character with `ch` would make the result smaller.

But we are allowed to pop the top character only if it appears again later.

If it does not appear later, popping it would lose that required character forever.

So the pop condition is:

```python
stack[-1] > ch and last[stack[-1]] > i
```

Meaning:

1. The stack top is larger than the current character.
2. The stack top appears again later.
3. So we can safely remove it now and use its later occurrence.

## Last Occurrence Map

Before scanning, record the last index of every character.

For:

```python
s = "cbacdcbc"
```

the last occurrence map is:

| Character | Last Index |
|---|---:|
| `c` | `7` |
| `b` | `6` |
| `a` | `2` |
| `d` | `4` |

This tells us whether a character can be safely removed from the stack.

For example, at index `6`, the current character is `b`.

If the stack top is `d`, we cannot pop `d`, because its last occurrence was index `4`.

That means there is no later `d`.

## Algorithm

1. Record the last occurrence index of each character.
2. Create an empty stack.
3. Create a set `used` for letters already in the stack.
4. Scan `s` from left to right.
5. If the current character is already in `used`, skip it.
6. Otherwise, while:
   - stack is not empty,
   - stack top is greater than current character,
   - stack top appears again later,
   
   pop the stack top and remove it from `used`.
7. Push the current character and mark it as used.
8. Return `"".join(stack)`.

## Walkthrough

Use:

```python
s = "bcabc"
```

Last positions:

| Character | Last Index |
|---|---:|
| `b` | `3` |
| `c` | `4` |
| `a` | `2` |

Start:

```python
stack = []
used = set()
```

Index `0`, character `b`.

`b` is unused, so push it.

```python
stack = ["b"]
```

Index `1`, character `c`.

`c` is larger than `b`, so push it.

```python
stack = ["b", "c"]
```

Index `2`, character `a`.

Stack top `c` is greater than `a`, and `c` appears again later at index `4`, so pop `c`.

```python
stack = ["b"]
```

Now stack top `b` is greater than `a`, and `b` appears again later at index `3`, so pop `b`.

```python
stack = []
```

Push `a`.

```python
stack = ["a"]
```

Index `3`, character `b`.

Push `b`.

```python
stack = ["a", "b"]
```

Index `4`, character `c`.

Push `c`.

```python
stack = ["a", "b", "c"]
```

Final answer:

```python
"abc"
```

## Correctness

The stack always contains distinct characters because we skip any character already in `used`.

When processing a character `ch`, the algorithm removes larger characters from the end of the current result only when those characters appear again later. This makes the prefix lexicographically smaller while preserving the ability to include every required character.

If a larger stack-top character does not appear later, the algorithm keeps it. Removing it would make it impossible to produce a valid answer containing every distinct letter.

So every pop is safe, and every refused pop is necessary.

Because the algorithm applies this rule at every position from left to right, the earliest possible position is always given to the smallest character that can safely appear there. This is exactly the greedy condition needed for the lexicographically smallest valid subsequence.

Therefore, the final stack contains every distinct character exactly once and forms the smallest valid result.

## Complexity

| Metric | Value | Why |
|---|---:|---|
| Time | `O(n)` | Each character is pushed and popped at most once |
| Space | `O(1)` | Only lowercase English letters are used |

If we write space in terms of the alphabet size `k`, it is `O(k)`.

## Implementation

```python
class Solution:
    def removeDuplicateLetters(self, s: str) -> str:
        last = {
            ch: i
            for i, ch in enumerate(s)
        }

        stack = []
        used = set()

        for i, ch in enumerate(s):
            if ch in used:
                continue

            while (
                stack
                and stack[-1] > ch
                and last[stack[-1]] > i
            ):
                removed = stack.pop()
                used.remove(removed)

            stack.append(ch)
            used.add(ch)

        return "".join(stack)
```

## Code Explanation

First, record where each character appears last.

```python
last = {
    ch: i
    for i, ch in enumerate(s)
}
```

The stack stores the answer being built.

```python
stack = []
```

The `used` set tells us which characters are already in the stack.

```python
used = set()
```

Scan the string:

```python
for i, ch in enumerate(s):
```

If `ch` already exists in the stack, skip it.

```python
if ch in used:
    continue
```

Now remove larger characters from the stack when safe.

```python
while (
    stack
    and stack[-1] > ch
    and last[stack[-1]] > i
):
```

This condition says:

1. There is something to remove.
2. The current character gives a smaller lexicographical prefix.
3. The removed character can still be used later.

When we pop, we must also remove it from `used`.

```python
removed = stack.pop()
used.remove(removed)
```

Finally, add the current character.

```python
stack.append(ch)
used.add(ch)
```

The answer is the stack joined into one string.

```python
return "".join(stack)
```

## Testing

```python
def run_tests():
    s = Solution()

    assert s.removeDuplicateLetters("bcabc") == "abc"
    assert s.removeDuplicateLetters("cbacdcbc") == "acdb"
    assert s.removeDuplicateLetters("abacb") == "abc"
    assert s.removeDuplicateLetters("bbcaac") == "bac"
    assert s.removeDuplicateLetters("a") == "a"
    assert s.removeDuplicateLetters("aaaa") == "a"
    assert s.removeDuplicateLetters("edebbed") == "bed"

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| `"bcabc"` | Basic lexicographical improvement |
| `"cbacdcbc"` | Standard example |
| `"abacb"` | Skips duplicates after characters are already used |
| `"bbcaac"` | Cannot always sort distinct letters |
| `"a"` | Single character |
| `"aaaa"` | All duplicates collapse to one |
| `"edebbed"` | Tests popping only when a later copy exists |