A clear explanation of Remove Duplicate Letters using a greedy monotonic stack.
Problem Restatement
We are given a string s containing lowercase English letters.
We need to remove duplicate letters so that every distinct letter appears exactly once.
Among all valid results, return the lexicographically smallest one.
The official constraints are:
| Constraint | Value |
|---|---|
s.length | 1 <= s.length <= 10^4 |
| Characters | Lowercase English letters |
This problem is also the same as LeetCode 1081, Smallest Subsequence of Distinct Characters. (github.com)
Input and Output
| Item | Meaning |
|---|---|
| Input | A lowercase string s |
| Output | Smallest lexicographical string containing each distinct letter once |
| Operation | Remove characters only |
| Important rule | Relative order of kept characters must come from the original string |
Function shape:
def removeDuplicateLetters(s: str) -> str:
...Examples
Example 1:
s = "bcabc"Output:
"abc"We can keep the later b and later c, giving "abc".
Another possible result is "bca", but "abc" is lexicographically smaller.
Example 2:
s = "cbacdcbc"Output:
"acdb"The result must contain:
a, b, c, dexactly once.
The smallest valid subsequence is:
"acdb"We cannot simply sort the letters into "abcd", because "abcd" is not a subsequence of "cbacdcbc".
First Thought: Sort Unique Characters
A tempting idea is:
return "".join(sorted(set(s)))For:
s = "bcabc"this gives:
"abc"That works for this case.
But for:
s = "cbacdcbc"it gives:
"abcd"This is invalid, because the letters must remain in an order possible by deleting characters from the original string.
We can delete characters, but we cannot reorder them.
Key Insight
We build the answer from left to right using a stack.
The stack stores the current best answer prefix.
When we see a new character ch, we want to place it as early as possible if it makes the result smaller.
If the top of the stack is lexicographically larger than ch, then replacing that top character with ch would make the result smaller.
But we are allowed to pop the top character only if it appears again later.
If it does not appear later, popping it would lose that required character forever.
So the pop condition is:
stack[-1] > ch and last[stack[-1]] > iMeaning:
- The stack top is larger than the current character.
- The stack top appears again later.
- So we can safely remove it now and use its later occurrence.
Last Occurrence Map
Before scanning, record the last index of every character.
For:
s = "cbacdcbc"the last occurrence map is:
| Character | Last Index |
|---|---|
c | 7 |
b | 6 |
a | 2 |
d | 4 |
This tells us whether a character can be safely removed from the stack.
For example, at index 6, the current character is b.
If the stack top is d, we cannot pop d, because its last occurrence was index 4.
That means there is no later d.
Algorithm
Record the last occurrence index of each character.
Create an empty stack.
Create a set
usedfor letters already in the stack.Scan
sfrom left to right.If the current character is already in
used, skip it.Otherwise, while:
- stack is not empty,
- stack top is greater than current character,
- stack top appears again later,
pop the stack top and remove it from
used.Push the current character and mark it as used.
Return
"".join(stack).
Walkthrough
Use:
s = "bcabc"Last positions:
| Character | Last Index |
|---|---|
b | 3 |
c | 4 |
a | 2 |
Start:
stack = []
used = set()Index 0, character b.
b is unused, so push it.
stack = ["b"]Index 1, character c.
c is larger than b, so push it.
stack = ["b", "c"]Index 2, character a.
Stack top c is greater than a, and c appears again later at index 4, so pop c.
stack = ["b"]Now stack top b is greater than a, and b appears again later at index 3, so pop b.
stack = []Push a.
stack = ["a"]Index 3, character b.
Push b.
stack = ["a", "b"]Index 4, character c.
Push c.
stack = ["a", "b", "c"]Final answer:
"abc"Correctness
The stack always contains distinct characters because we skip any character already in used.
When processing a character ch, the algorithm removes larger characters from the end of the current result only when those characters appear again later. This makes the prefix lexicographically smaller while preserving the ability to include every required character.
If a larger stack-top character does not appear later, the algorithm keeps it. Removing it would make it impossible to produce a valid answer containing every distinct letter.
So every pop is safe, and every refused pop is necessary.
Because the algorithm applies this rule at every position from left to right, the earliest possible position is always given to the smallest character that can safely appear there. This is exactly the greedy condition needed for the lexicographically smallest valid subsequence.
Therefore, the final stack contains every distinct character exactly once and forms the smallest valid result.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character is pushed and popped at most once |
| Space | O(1) | Only lowercase English letters are used |
If we write space in terms of the alphabet size k, it is O(k).
Implementation
class Solution:
def removeDuplicateLetters(self, s: str) -> str:
last = {
ch: i
for i, ch in enumerate(s)
}
stack = []
used = set()
for i, ch in enumerate(s):
if ch in used:
continue
while (
stack
and stack[-1] > ch
and last[stack[-1]] > i
):
removed = stack.pop()
used.remove(removed)
stack.append(ch)
used.add(ch)
return "".join(stack)Code Explanation
First, record where each character appears last.
last = {
ch: i
for i, ch in enumerate(s)
}The stack stores the answer being built.
stack = []The used set tells us which characters are already in the stack.
used = set()Scan the string:
for i, ch in enumerate(s):If ch already exists in the stack, skip it.
if ch in used:
continueNow remove larger characters from the stack when safe.
while (
stack
and stack[-1] > ch
and last[stack[-1]] > i
):This condition says:
- There is something to remove.
- The current character gives a smaller lexicographical prefix.
- The removed character can still be used later.
When we pop, we must also remove it from used.
removed = stack.pop()
used.remove(removed)Finally, add the current character.
stack.append(ch)
used.add(ch)The answer is the stack joined into one string.
return "".join(stack)Testing
def run_tests():
s = Solution()
assert s.removeDuplicateLetters("bcabc") == "abc"
assert s.removeDuplicateLetters("cbacdcbc") == "acdb"
assert s.removeDuplicateLetters("abacb") == "abc"
assert s.removeDuplicateLetters("bbcaac") == "bac"
assert s.removeDuplicateLetters("a") == "a"
assert s.removeDuplicateLetters("aaaa") == "a"
assert s.removeDuplicateLetters("edebbed") == "bed"
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"bcabc" | Basic lexicographical improvement |
"cbacdcbc" | Standard example |
"abacb" | Skips duplicates after characters are already used |
"bbcaac" | Cannot always sort distinct letters |
"a" | Single character |
"aaaa" | All duplicates collapse to one |
"edebbed" | Tests popping only when a later copy exists |