14.17 Greedy Data Structure Design
Many greedy algorithms are remembered by their choice rule: - Choose the earliest finishing interval.
14.17 Greedy Data Structure Design
Many greedy algorithms are remembered by their choice rule:
- Choose the earliest finishing interval.
- Choose the highest-density item.
- Choose the cheapest edge.
- Choose the largest available fuel station.
In practice, however, the success of a greedy algorithm often depends just as much on the supporting data structure as on the greedy rule itself.
A theoretically correct greedy strategy can become unusable if each decision requires scanning millions of candidates. Conversely, the right data structure can transform an elegant but impractical algorithm into a production-ready solution.
This recipe examines how greedy algorithms and data structures evolve together, and how to choose a structure that preserves the intended greedy behavior efficiently.
Problem
Suppose a greedy algorithm repeatedly performs operations such as:
Find the largest candidate.
Find the smallest candidate.
Insert a new candidate.
Delete an expired candidate.
Find the next available slot.
Merge two sets.
A naive implementation may require:
O(n)
per operation.
If the operation occurs:
n
times, total complexity becomes:
$$ O(n^2) $$
The challenge is to find a data structure that supports the greedy rule efficiently.
The Greedy Loop
Many greedy algorithms reduce to:
Initialize candidates
While work remains:
Select best candidate
Update state
Add or remove candidates
The algorithm itself may be only a few lines long.
The data structure determines whether the implementation runs in milliseconds or hours.
Pattern 1: Priority Queues
Perhaps the most common greedy data structure is the heap.
Whenever the algorithm repeatedly asks:
Largest?
Smallest?
Best?
Cheapest?
a heap is often the correct choice.
Example: Huffman Coding
At every step:
Extract two smallest frequencies.
Merge them.
Insert the merged node.
A naive implementation:
Find minimum twice
requires:
$$ O(n) $$
per merge.
Total complexity:
$$ O(n^2) $$
Using a min-heap:
Extract minimum
Insert
becomes:
$$ O(\log n) $$
Total complexity:
$$ O(n \log n) $$
The greedy rule remains unchanged.
Only the data structure changes.
Example: Minimum Refueling Stops
Greedy rule:
Choose largest reachable fuel station.
A max-heap stores all reachable stations.
Operations:
Push reachable station
Pop largest fuel amount
Complexity:
$$ O(\log n) $$
Without the heap, every refueling step requires rescanning all previously passed stations.
Pattern 2: Disjoint Sets
Some greedy algorithms repeatedly ask:
Are these two components already connected?
This is common in graph optimization.
Example: Kruskal's Algorithm
Greedy rule:
Choose smallest edge that does not create a cycle.
Cycle detection could be expensive.
A disjoint-set structure maintains connected components efficiently.
Operations:
Find
Union
With path compression:
$$ O(\alpha(n)) $$
which is effectively constant.
Without disjoint sets:
Cycle checks may require graph traversals.
Complexity increases dramatically.
Go Example
func Find(x int) int {
if parent[x] != x {
parent[x] = Find(parent[x])
}
return parent[x]
}
Path compression ensures future operations become faster.
Pattern 3: Balanced Search Trees
Some greedy algorithms need:
Largest value <= X
Smallest value >= X
rather than simply minimum or maximum.
Heaps cannot support these queries efficiently.
Balanced trees become useful.
Applications include:
- Resource allocation
- Interval management
- Dynamic scheduling
- Ordered matching
Example: Assigning Tasks to Machines
Suppose machines have capacities:
10
15
20
30
A task requires:
18
units.
Greedy rule:
Assign the smallest machine that can handle the task.
This requires a successor query:
Find first capacity >= 18
Balanced trees support this naturally.
Pattern 4: Monotonic Stacks
String and array greedy algorithms often maintain a frontier of candidates.
Examples:
- Remove duplicate letters
- Remove K digits
- Next greater element
- Largest rectangle in histogram
The stack stores candidates that remain potentially useful.
Whenever a stronger candidate appears, weaker candidates are removed.
Example
Input:
1432219
Goal:
Remove digits to minimize the number.
Stack evolution:
1
14
143
Current digit:
2
Since:
3 > 2
and removals remain available:
Pop 3
The stack encodes the greedy frontier.
Pattern 5: Two Heaps
Some problems need both extremes simultaneously.
Examples:
- Streaming median
- Dynamic balancing
- Online scheduling
Structure:
Max-heap for lower half
Min-heap for upper half
Greedy decisions occur at both ends.
Example: Streaming Median
Input:
5 2 8 1 9
Maintain:
Lower half
Upper half
Balance sizes after every insertion.
Median becomes available immediately.
This is a data-structure-driven greedy solution.
Pattern 6: Frequency Tables
When the universe of elements is small, elaborate structures may be unnecessary.
Example:
a-z
contains only 26 symbols.
A simple array:
count := make([]int, 26)
often outperforms heaps, maps, and trees.
Many string greedy algorithms rely on this observation.
Example: Partition Labels
Need:
Last occurrence of each character
Array lookup:
last[ch-'a']
is constant time.
A more sophisticated structure would only add overhead.
Pattern 7: Sorted Arrays
Sometimes a full dynamic structure is unnecessary.
Sort once.
Scan once.
Finish.
Examples:
- Interval scheduling
- Activity selection
- Merge intervals
- Boat rescue
- Cookie assignment
The sort performs all ordering work up front.
The greedy scan then becomes linear.
Choosing the Right Structure
A useful design question is:
What operation defines the greedy choice?
Once identified, the data structure often becomes obvious.
| Operation | Data Structure |
|---|---|
| Smallest candidate | Min-heap |
| Largest candidate | Max-heap |
| Connectivity test | Union-find |
| Ordered successor | Balanced tree |
| Frontier maintenance | Stack |
| Prefix statistics | Fenwick tree |
| Static ordering | Sorted array |
The greedy rule determines the required operation.
The operation determines the data structure.
Data Structures as Proof Aids
An interesting observation is that data structures often mirror the proof.
Consider interval scheduling.
Proof:
Keep earliest finish.
Implementation:
Sort by finish time.
The data structure directly represents the invariant.
Similarly:
Minimum refueling:
Keep largest fuel station available.
Implementation:
Max-heap.
Again, the structure encodes the proof.
When the Wrong Structure Breaks the Algorithm
A common mistake is choosing a structure that does not naturally support the greedy rule.
Example:
Using a queue for Huffman coding.
The algorithm requires:
Extract smallest frequency
A queue provides:
Oldest element
The implementation no longer matches the proof.
Correctness may remain intact, but performance collapses.
In other cases, correctness itself may disappear.
Designing Greedy Algorithms Backwards
A practical workflow is:
Step 1
Identify the greedy choice.
Example:
Choose largest available value.
Step 2
List required operations.
Insert
Remove maximum
Step 3
Choose the data structure.
Max-heap
Step 4
Analyze complexity.
This process is often easier than designing the algorithm and data structure independently.
Real-World Examples
Network Routing
Greedy rule:
Choose shortest tentative path.
Structure:
Min-heap
Cloud Scheduling
Greedy rule:
Choose highest-priority task.
Structure:
Priority queue
Data Compression
Greedy rule:
Merge smallest frequencies.
Structure:
Min-heap
Graph Connectivity
Greedy rule:
Choose smallest safe edge.
Structure:
Union-find
The combination of greedy logic and efficient structures appears everywhere in production systems.
Common Mistakes
Choosing the Structure First
The greedy rule should determine the structure, not the reverse.
Ignoring Operation Costs
A correct algorithm with expensive operations may be unusable.
Overengineering
Sometimes a sorted array is sufficient.
Not every problem needs a heap or tree.
Breaking the Invariant
The data structure must preserve the ordering assumed by the proof.
Changing the structure may invalidate the reasoning.
Relationship to Previous Recipes
Earlier recipes focused on identifying safe local choices.
This recipe focuses on implementing those choices efficiently.
The progression is:
Greedy proof
→
Required operation
→
Data structure
→
Efficient algorithm
Understanding all four layers is essential for practical algorithm engineering.
Takeaway
Greedy algorithms and data structures are deeply connected. The greedy proof identifies the operation that must be performed repeatedly, and the data structure provides an efficient implementation of that operation. Heaps, union-find structures, balanced trees, stacks, frequency tables, and sorted arrays each support specific classes of greedy choices. In practice, designing a successful greedy algorithm often means designing the right data structure alongside it. The proof establishes what choice is safe; the data structure determines whether that choice can be made efficiently at scale.