14.17 Greedy Data Structure Design

Many greedy algorithms are remembered by their choice rule: - Choose the earliest finishing interval.

14.17 Greedy Data Structure Design

Many greedy algorithms are remembered by their choice rule:

  • Choose the earliest finishing interval.
  • Choose the highest-density item.
  • Choose the cheapest edge.
  • Choose the largest available fuel station.

In practice, however, the success of a greedy algorithm often depends just as much on the supporting data structure as on the greedy rule itself.

A theoretically correct greedy strategy can become unusable if each decision requires scanning millions of candidates. Conversely, the right data structure can transform an elegant but impractical algorithm into a production-ready solution.

This recipe examines how greedy algorithms and data structures evolve together, and how to choose a structure that preserves the intended greedy behavior efficiently.

Problem

Suppose a greedy algorithm repeatedly performs operations such as:

Find the largest candidate.
Find the smallest candidate.
Insert a new candidate.
Delete an expired candidate.
Find the next available slot.
Merge two sets.

A naive implementation may require:

O(n)

per operation.

If the operation occurs:

n

times, total complexity becomes:

$$ O(n^2) $$

The challenge is to find a data structure that supports the greedy rule efficiently.

The Greedy Loop

Many greedy algorithms reduce to:

Initialize candidates

While work remains:
    Select best candidate
    Update state
    Add or remove candidates

The algorithm itself may be only a few lines long.

The data structure determines whether the implementation runs in milliseconds or hours.

Pattern 1: Priority Queues

Perhaps the most common greedy data structure is the heap.

Whenever the algorithm repeatedly asks:

Largest?
Smallest?
Best?
Cheapest?

a heap is often the correct choice.

Example: Huffman Coding

At every step:

Extract two smallest frequencies.
Merge them.
Insert the merged node.

A naive implementation:

Find minimum twice

requires:

$$ O(n) $$

per merge.

Total complexity:

$$ O(n^2) $$

Using a min-heap:

Extract minimum
Insert

becomes:

$$ O(\log n) $$

Total complexity:

$$ O(n \log n) $$

The greedy rule remains unchanged.

Only the data structure changes.

Example: Minimum Refueling Stops

Greedy rule:

Choose largest reachable fuel station.

A max-heap stores all reachable stations.

Operations:

Push reachable station
Pop largest fuel amount

Complexity:

$$ O(\log n) $$

Without the heap, every refueling step requires rescanning all previously passed stations.

Pattern 2: Disjoint Sets

Some greedy algorithms repeatedly ask:

Are these two components already connected?

This is common in graph optimization.

Example: Kruskal's Algorithm

Greedy rule:

Choose smallest edge that does not create a cycle.

Cycle detection could be expensive.

A disjoint-set structure maintains connected components efficiently.

Operations:

Find
Union

With path compression:

$$ O(\alpha(n)) $$

which is effectively constant.

Without disjoint sets:

Cycle checks may require graph traversals.

Complexity increases dramatically.

Go Example

func Find(x int) int {
	if parent[x] != x {
		parent[x] = Find(parent[x])
	}
	return parent[x]
}

Path compression ensures future operations become faster.

Pattern 3: Balanced Search Trees

Some greedy algorithms need:

Largest value <= X
Smallest value >= X

rather than simply minimum or maximum.

Heaps cannot support these queries efficiently.

Balanced trees become useful.

Applications include:

  • Resource allocation
  • Interval management
  • Dynamic scheduling
  • Ordered matching

Example: Assigning Tasks to Machines

Suppose machines have capacities:

10
15
20
30

A task requires:

18

units.

Greedy rule:

Assign the smallest machine that can handle the task.

This requires a successor query:

Find first capacity >= 18

Balanced trees support this naturally.

Pattern 4: Monotonic Stacks

String and array greedy algorithms often maintain a frontier of candidates.

Examples:

  • Remove duplicate letters
  • Remove K digits
  • Next greater element
  • Largest rectangle in histogram

The stack stores candidates that remain potentially useful.

Whenever a stronger candidate appears, weaker candidates are removed.

Example

Input:

1432219

Goal:

Remove digits to minimize the number.

Stack evolution:

1
14
143

Current digit:

2

Since:

3 > 2

and removals remain available:

Pop 3

The stack encodes the greedy frontier.

Pattern 5: Two Heaps

Some problems need both extremes simultaneously.

Examples:

  • Streaming median
  • Dynamic balancing
  • Online scheduling

Structure:

Max-heap for lower half
Min-heap for upper half

Greedy decisions occur at both ends.

Example: Streaming Median

Input:

5 2 8 1 9

Maintain:

Lower half
Upper half

Balance sizes after every insertion.

Median becomes available immediately.

This is a data-structure-driven greedy solution.

Pattern 6: Frequency Tables

When the universe of elements is small, elaborate structures may be unnecessary.

Example:

a-z

contains only 26 symbols.

A simple array:

count := make([]int, 26)

often outperforms heaps, maps, and trees.

Many string greedy algorithms rely on this observation.

Example: Partition Labels

Need:

Last occurrence of each character

Array lookup:

last[ch-'a']

is constant time.

A more sophisticated structure would only add overhead.

Pattern 7: Sorted Arrays

Sometimes a full dynamic structure is unnecessary.

Sort once.

Scan once.

Finish.

Examples:

  • Interval scheduling
  • Activity selection
  • Merge intervals
  • Boat rescue
  • Cookie assignment

The sort performs all ordering work up front.

The greedy scan then becomes linear.

Choosing the Right Structure

A useful design question is:

What operation defines the greedy choice?

Once identified, the data structure often becomes obvious.

Operation Data Structure
Smallest candidate Min-heap
Largest candidate Max-heap
Connectivity test Union-find
Ordered successor Balanced tree
Frontier maintenance Stack
Prefix statistics Fenwick tree
Static ordering Sorted array

The greedy rule determines the required operation.

The operation determines the data structure.

Data Structures as Proof Aids

An interesting observation is that data structures often mirror the proof.

Consider interval scheduling.

Proof:

Keep earliest finish.

Implementation:

Sort by finish time.

The data structure directly represents the invariant.

Similarly:

Minimum refueling:

Keep largest fuel station available.

Implementation:

Max-heap.

Again, the structure encodes the proof.

When the Wrong Structure Breaks the Algorithm

A common mistake is choosing a structure that does not naturally support the greedy rule.

Example:

Using a queue for Huffman coding.

The algorithm requires:

Extract smallest frequency

A queue provides:

Oldest element

The implementation no longer matches the proof.

Correctness may remain intact, but performance collapses.

In other cases, correctness itself may disappear.

Designing Greedy Algorithms Backwards

A practical workflow is:

Step 1

Identify the greedy choice.

Example:

Choose largest available value.

Step 2

List required operations.

Insert
Remove maximum

Step 3

Choose the data structure.

Max-heap

Step 4

Analyze complexity.

This process is often easier than designing the algorithm and data structure independently.

Real-World Examples

Network Routing

Greedy rule:

Choose shortest tentative path.

Structure:

Min-heap

Cloud Scheduling

Greedy rule:

Choose highest-priority task.

Structure:

Priority queue

Data Compression

Greedy rule:

Merge smallest frequencies.

Structure:

Min-heap

Graph Connectivity

Greedy rule:

Choose smallest safe edge.

Structure:

Union-find

The combination of greedy logic and efficient structures appears everywhere in production systems.

Common Mistakes

Choosing the Structure First

The greedy rule should determine the structure, not the reverse.

Ignoring Operation Costs

A correct algorithm with expensive operations may be unusable.

Overengineering

Sometimes a sorted array is sufficient.

Not every problem needs a heap or tree.

Breaking the Invariant

The data structure must preserve the ordering assumed by the proof.

Changing the structure may invalidate the reasoning.

Relationship to Previous Recipes

Earlier recipes focused on identifying safe local choices.

This recipe focuses on implementing those choices efficiently.

The progression is:

Greedy proof
→
Required operation
→
Data structure
→
Efficient algorithm

Understanding all four layers is essential for practical algorithm engineering.

Takeaway

Greedy algorithms and data structures are deeply connected. The greedy proof identifies the operation that must be performed repeatedly, and the data structure provides an efficient implementation of that operation. Heaps, union-find structures, balanced trees, stacks, frequency tables, and sorted arrays each support specific classes of greedy choices. In practice, designing a successful greedy algorithm often means designing the right data structure alongside it. The proof establishes what choice is safe; the data structure determines whether that choice can be made efficiently at scale.