LeetCode 316: Remove Duplicate Letters

Problem Restatement

We are given a string s containing lowercase English letters.

We need to remove duplicate letters so that every distinct letter appears exactly once.

Among all valid results, return the lexicographically smallest one.

The official constraints are:

Constraint	Value
`s.length`	`1 <= s.length <= 10^4`
Characters	Lowercase English letters

This problem is also the same as LeetCode 1081, Smallest Subsequence of Distinct Characters. (github.com)

Input and Output

Item	Meaning
Input	A lowercase string `s`
Output	Smallest lexicographical string containing each distinct letter once
Operation	Remove characters only
Important rule	Relative order of kept characters must come from the original string

Function shape:

def removeDuplicateLetters(s: str) -> str:
    ...

Examples

Example 1:

s = "bcabc"

Output:

"abc"

We can keep the later b and later c, giving "abc".

Another possible result is "bca", but "abc" is lexicographically smaller.

Example 2:

s = "cbacdcbc"

Output:

"acdb"

The result must contain:

a, b, c, d

exactly once.

The smallest valid subsequence is:

"acdb"

We cannot simply sort the letters into "abcd", because "abcd" is not a subsequence of "cbacdcbc".

First Thought: Sort Unique Characters

A tempting idea is:

return "".join(sorted(set(s)))

For:

s = "bcabc"

this gives:

"abc"

That works for this case.

But for:

s = "cbacdcbc"

it gives:

"abcd"

This is invalid, because the letters must remain in an order possible by deleting characters from the original string.

We can delete characters, but we cannot reorder them.

Key Insight

We build the answer from left to right using a stack.

The stack stores the current best answer prefix.

When we see a new character ch, we want to place it as early as possible if it makes the result smaller.

If the top of the stack is lexicographically larger than ch, then replacing that top character with ch would make the result smaller.

But we are allowed to pop the top character only if it appears again later.

If it does not appear later, popping it would lose that required character forever.

So the pop condition is:

stack[-1] > ch and last[stack[-1]] > i

Meaning:

The stack top is larger than the current character.
The stack top appears again later.
So we can safely remove it now and use its later occurrence.

Last Occurrence Map

Before scanning, record the last index of every character.

For:

s = "cbacdcbc"

the last occurrence map is:

Character	Last Index
`c`	`7`
`b`	`6`
`a`	`2`
`d`	`4`

This tells us whether a character can be safely removed from the stack.

For example, at index 6, the current character is b.

If the stack top is d, we cannot pop d, because its last occurrence was index 4.

That means there is no later d.

Algorithm

Record the last occurrence index of each character.
Create an empty stack.
Create a set used for letters already in the stack.
Scan s from left to right.
If the current character is already in used, skip it.
Otherwise, while:
- stack is not empty,
- stack top is greater than current character,
- stack top appears again later,
pop the stack top and remove it from used.
Push the current character and mark it as used.
Return "".join(stack).

Walkthrough

Use:

s = "bcabc"

Last positions:

Character	Last Index
`b`	`3`
`c`	`4`
`a`	`2`

Start:

stack = []
used = set()

Index 0, character b.

b is unused, so push it.

stack = ["b"]

Index 1, character c.

c is larger than b, so push it.

stack = ["b", "c"]

Index 2, character a.

Stack top c is greater than a, and c appears again later at index 4, so pop c.

stack = ["b"]

Now stack top b is greater than a, and b appears again later at index 3, so pop b.

stack = []

Push a.

stack = ["a"]

Index 3, character b.

Push b.

stack = ["a", "b"]

Index 4, character c.

Push c.

stack = ["a", "b", "c"]

Final answer:

"abc"

Correctness

The stack always contains distinct characters because we skip any character already in used.

When processing a character ch, the algorithm removes larger characters from the end of the current result only when those characters appear again later. This makes the prefix lexicographically smaller while preserving the ability to include every required character.

If a larger stack-top character does not appear later, the algorithm keeps it. Removing it would make it impossible to produce a valid answer containing every distinct letter.

So every pop is safe, and every refused pop is necessary.

Because the algorithm applies this rule at every position from left to right, the earliest possible position is always given to the smallest character that can safely appear there. This is exactly the greedy condition needed for the lexicographically smallest valid subsequence.

Therefore, the final stack contains every distinct character exactly once and forms the smallest valid result.

Complexity

Metric	Value	Why
Time	`O(n)`	Each character is pushed and popped at most once
Space	`O(1)`	Only lowercase English letters are used

If we write space in terms of the alphabet size k, it is O(k).

Implementation

class Solution:
    def removeDuplicateLetters(self, s: str) -> str:
        last = {
            ch: i
            for i, ch in enumerate(s)
        }

        stack = []
        used = set()

        for i, ch in enumerate(s):
            if ch in used:
                continue

            while (
                stack
                and stack[-1] > ch
                and last[stack[-1]] > i
            ):
                removed = stack.pop()
                used.remove(removed)

            stack.append(ch)
            used.add(ch)

        return "".join(stack)

Code Explanation

First, record where each character appears last.

last = {
    ch: i
    for i, ch in enumerate(s)
}

The stack stores the answer being built.

stack = []

The used set tells us which characters are already in the stack.

used = set()

Scan the string:

for i, ch in enumerate(s):

If ch already exists in the stack, skip it.

if ch in used:
    continue

Now remove larger characters from the stack when safe.

while (
    stack
    and stack[-1] > ch
    and last[stack[-1]] > i
):

This condition says:

There is something to remove.
The current character gives a smaller lexicographical prefix.
The removed character can still be used later.

When we pop, we must also remove it from used.

removed = stack.pop()
used.remove(removed)

Finally, add the current character.

stack.append(ch)
used.add(ch)

The answer is the stack joined into one string.

return "".join(stack)

Testing

def run_tests():
    s = Solution()

    assert s.removeDuplicateLetters("bcabc") == "abc"
    assert s.removeDuplicateLetters("cbacdcbc") == "acdb"
    assert s.removeDuplicateLetters("abacb") == "abc"
    assert s.removeDuplicateLetters("bbcaac") == "bac"
    assert s.removeDuplicateLetters("a") == "a"
    assert s.removeDuplicateLetters("aaaa") == "a"
    assert s.removeDuplicateLetters("edebbed") == "bed"

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`"bcabc"`	Basic lexicographical improvement
`"cbacdcbc"`	Standard example
`"abacb"`	Skips duplicates after characters are already used
`"bbcaac"`	Cannot always sort distinct letters
`"a"`	Single character
`"aaaa"`	All duplicates collapse to one
`"edebbed"`	Tests popping only when a later copy exists