14.16 String Greedy

String problems often invite dynamic programming, tries, automata, and hashing.

14.16 String Greedy

String problems often invite dynamic programming, tries, automata, and hashing. But a substantial class of string problems has a simpler structure: decisions can be made greedily from left to right, from right to left, or from a stack-maintained frontier.

The difficulty is recognizing when a character decision is permanent. A greedy string algorithm usually depends on a monotonic property: once a character is placed, removed, or skipped, the remaining suffix can still be solved optimally.

This recipe introduces common greedy patterns for strings and shows how to prove that local character decisions are safe.

Problem

You are given one or more strings and need to construct, reduce, partition, or compare them optimally.

Common objectives include:

  • Build the lexicographically smallest result.
  • Remove characters while preserving order.
  • Partition a string under constraints.
  • Match subsequences.
  • Minimize edits under a restricted operation model.
  • Choose the earliest or latest feasible character.

The input is sequential, but the decision may depend on future characters. Greedy algorithms work when that future information can be summarized cheaply.

Why Strings Are Tricky

Consider the problem:

Remove duplicate letters so that every letter appears once and the result is lexicographically smallest.

Input:

cbacdcbc

A naive left-to-right rule might keep the first occurrence of each character:

cbad

But the optimal result is:

acdb

The first occurrence of c should not be kept because another c appears later. This means a correct greedy algorithm must know whether a discarded character can be recovered.

String greedy algorithms often need two pieces of state:

  1. What has already been chosen.
  2. Whether a future copy still exists.

Pattern 1: Lexicographically Smallest Stack

A common pattern is:

Maintain a stack, and remove larger previous characters if they can appear again later.

This solves problems such as:

  • Remove duplicate letters.
  • Smallest subsequence of distinct characters.
  • Remove k digits.
  • Build a minimal lexicographic subsequence.

Example: Remove Duplicate Letters

Input:

cbacdcbc

Goal:

Return the lexicographically smallest string containing each distinct character exactly once.

Expected result:

acdb

Greedy Insight

When processing a character ch, compare it with the end of the current answer.

If the previous character is larger than ch, then replacing it with ch makes the result lexicographically smaller.

But removal is allowed only if the previous character appears later again.

That gives the rule:

While stack is non-empty,
and top > ch,
and top appears later:
    pop top
Push ch if not already used

Walkthrough

Input:

c b a c d c b c

Track remaining counts.

Process c:

stack = c

Process b.

c > b, and c appears later, so pop c.

stack = b

Process a.

b > a, and b appears later, so pop b.

stack = a

Process c.

stack = a c

Process d.

stack = a c d

Process later c.

Already used, skip.

Process b.

d > b, but d does not appear later, so cannot pop d.

Push b.

stack = a c d b

Final result:

acdb

Go Implementation

func RemoveDuplicateLetters(s string) string {
	count := make([]int, 26)
	for _, r := range s {
		count[r-'a']++
	}

	used := make([]bool, 26)
	stack := make([]rune, 0, len(s))

	for _, ch := range s {
		idx := ch - 'a'
		count[idx]--

		if used[idx] {
			continue
		}

		for len(stack) > 0 {
			top := stack[len(stack)-1]
			topIdx := top - 'a'

			if top <= ch || count[topIdx] == 0 {
				break
			}

			stack = stack[:len(stack)-1]
			used[topIdx] = false
		}

		stack = append(stack, ch)
		used[idx] = true
	}

	return string(stack)
}

Why the Greedy Choice Works

The stack represents the best prefix constructed so far.

When the top character is larger than the current character, replacing it improves lexicographic order at the earliest differing position. This is always desirable.

However, the replacement is safe only if the removed character occurs later. If it does not, removing it would violate the requirement that every character appear once.

The algorithm therefore makes the smallest safe prefix at every step.

Pattern 2: Remove k Digits

Given a numeric string and an integer k, remove exactly k digits to make the smallest possible number.

Input:

num = "1432219"
k = 3

Expected result:

1219

The greedy rule is similar:

While stack top > current digit and k > 0:
    pop stack
    k--
Push current digit

The intuition is that a larger digit before a smaller digit is expensive. Removing the larger digit improves the number at the earliest possible position.

Implementation

func RemoveKdigits(num string, k int) string {
	stack := make([]byte, 0, len(num))

	for i := 0; i < len(num); i++ {
		d := num[i]

		for len(stack) > 0 && k > 0 && stack[len(stack)-1] > d {
			stack = stack[:len(stack)-1]
			k--
		}

		stack = append(stack, d)
	}

	for k > 0 && len(stack) > 0 {
		stack = stack[:len(stack)-1]
		k--
	}

	i := 0
	for i < len(stack) && stack[i] == '0' {
		i++
	}

	if i == len(stack) {
		return "0"
	}

	return string(stack[i:])
}

Pattern 3: Earliest Feasible Match

Some string problems ask whether one string is a subsequence of another.

Given:

s = "ace"
t = "abcde"

A greedy scan matches each character of s at the earliest possible position in t.

Why earliest?

Because matching earlier leaves the largest suffix for future characters.

Algorithm:

i = 0

For each character ch in t:
    If i < len(s) and s[i] == ch:
        i++

Return i == len(s)

Go Implementation

func IsSubsequence(s string, t string) bool {
	i := 0

	for j := 0; j < len(t) && i < len(s); j++ {
		if s[i] == t[j] {
			i++
		}
	}

	return i == len(s)
}

Pattern 4: Partition by Last Occurrence

Some greedy string problems require building maximal or minimal partitions.

Example:

Partition a string so that each character appears in at most one part.

Input:

ababcbacadefegdehijhklij

Expected partition lengths:

9, 7, 8

The key is the last occurrence of each character.

While scanning, maintain the farthest last occurrence of any character seen so far.

When the current index reaches that boundary, a partition can close safely.

Implementation

func PartitionLabels(s string) []int {
	last := make([]int, 26)

	for i := 0; i < len(s); i++ {
		last[s[i]-'a'] = i
	}

	result := []int{}
	start := 0
	end := 0

	for i := 0; i < len(s); i++ {
		if last[s[i]-'a'] > end {
			end = last[s[i]-'a']
		}

		if i == end {
			result = append(result, end-start+1)
			start = i + 1
		}
	}

	return result
}

Why Partitioning Works

A partition cannot end before the last occurrence of every character inside it.

As the scan proceeds, each new character may extend the required boundary.

Once the scan reaches the current boundary, every character in the partition has been fully contained. Closing the partition is safe and earliest possible.

This is a greedy decision because the algorithm commits to the smallest valid prefix partition.

Pattern 5: Greedy Matching with Counts

Some string construction problems use frequency counts.

Example:

Reorganize a string so no two adjacent characters are equal.

A common greedy strategy is:

  1. Always choose the most frequent valid character.
  2. Avoid choosing the same character as the previous output.
  3. Use a max-heap of remaining counts.

This is a priority-queue greedy pattern applied to strings.

The local choice is safe because the most frequent character is the most dangerous one to postpone. If it is not placed often enough, it may become impossible to separate later.

Common String Greedy Signals

String greedy is likely when the problem has these features:

Signal Typical Technique
Need lexicographically smallest result Monotonic stack
Need preserve subsequence order Earliest feasible match
Need remove characters Stack with future availability
Need partition by containment Last occurrence boundary
Need arrange by frequency Max-heap
Need minimize prefix Left-to-right greedy

The proof usually depends on showing that the chosen prefix is at least as good as any other feasible prefix.

Common Mistakes

Ignoring Future Occurrences

In duplicate-removal problems, you may only discard a character if it appears later.

Comparing Whole Strings Too Late

Lexicographic optimality is decided at the first differing character. Greedy proofs should focus on the earliest position.

Failing to Preserve Order

Many string problems require subsequences, not arbitrary rearrangements.

Treating Bytes as Characters

In Go, indexing a string gives bytes, not Unicode code points. For ASCII algorithm problems this is fine. For general text, use rune.

Forgetting Cleanup

Problems such as remove-k-digits may require removing extra trailing digits after the main scan.

Complexity Analysis

Most string greedy algorithms run in linear time after small preprocessing.

Typical costs:

Pattern Time Space
Monotonic stack O(n) O(n)
Subsequence scan O(n) O(1)
Last occurrence partition O(n) O(1) for fixed alphabet
Frequency heap O(n log σ) O(σ)

Here σ is the alphabet size.

For lowercase English letters, σ = 26, so heap or count operations are effectively constant.

Takeaway

String greedy algorithms work when local character decisions produce the best safe prefix. The main tools are monotonic stacks, earliest feasible matching, last-occurrence boundaries, and frequency-based priority queues. The implementation is usually compact, but the proof must account for future characters. A character can be removed, skipped, or committed only when the remaining suffix still preserves feasibility.