13.22 State Compression and Memory Optimization
One of the most common mistakes in dynamic programming is focusing exclusively on time complexity while ignoring memory consumption.
13.22 State Compression and Memory Optimization
One of the most common mistakes in dynamic programming is focusing exclusively on time complexity while ignoring memory consumption. A solution may be theoretically correct and fast enough, yet fail in practice because the DP table is too large.
Fortunately, many dynamic programming algorithms store far more information than they actually need. States are often retained long after they cease to influence future computations. By analyzing dependencies carefully, memory requirements can frequently be reduced by an order of magnitude or more without changing the recurrence itself.
State compression is the process of storing only the information required to continue computation. It is one of the most valuable optimization skills because it transforms elegant theoretical solutions into practical implementations.
Problem
How do we reduce the memory usage of dynamic programming algorithms while preserving correctness?
Understanding Dependency Windows
The key question is:
Which previously computed states are actually needed?
Consider Fibonacci:
$$ dp[i] = dp[i-1] + dp[i-2] $$
A naive implementation stores:
dp[0]
dp[1]
...
dp[n]
Yet the recurrence only depends on the previous two values.
Everything older than:
i-2
is irrelevant.
This observation immediately reduces memory usage.
Example: Fibonacci Compression
Naive implementation:
dp := make([]int, n+1)
dp[0] = 0
dp[1] = 1
for i := 2; i <= n; i++ {
dp[i] = dp[i-1] + dp[i-2]
}
Memory:
$$ O(n) $$
Compressed version:
a := 0
b := 1
for i := 2; i <= n; i++ {
c := a + b
a = b
b = c
}
Memory:
$$ O(1) $$
The recurrence remains unchanged.
Only storage changes.
The Rolling Array Pattern
Two-dimensional dynamic programming often depends on only the previous row.
Example:
$$ dp[i][j] = f(dp[i-1][j], dp[i-1][j-1]) $$
The current row depends only on:
previous row
Store two rows:
previous := make([]int, m)
current := make([]int, m)
After finishing a row:
previous, current =
current, previous
Memory decreases from:
$$ O(nm) $$
to:
$$ O(m) $$
This technique is called a rolling array.
Example: Longest Common Subsequence
The classical recurrence:
$$ dp[i][j] $$
depends on:
- left
- above
- diagonal
Only the current row and previous row are needed.
Implementation:
previous := make([]int, m+1)
current := make([]int, m+1)
for i := 1; i <= n; i++ {
for j := 1; j <= m; j++ {
if a[i-1] == b[j-1] {
current[j] =
previous[j-1] + 1
} else {
current[j] =
max(
previous[j],
current[j-1],
)
}
}
previous, current =
current, previous
}
Memory becomes:
$$ O(m) $$
instead of:
$$ O(nm) $$
Visualizing Row Compression
Original table:
| Row 0 | • | • | • | • |
| Row 1 | • | • | • | • |
| Row 2 | • | • | • | • |
| Row 3 | • | • | • | • |
Suppose computation is currently processing:
Row 3
Rows:
0
1
are already irrelevant.
Only:
Row 2
Row 3
remain necessary.
The rolling array stores exactly these rows.
Single-Row Compression
Sometimes the previous row can be merged into the current row.
Consider:
$$ dp[i][j] = dp[i][j] + dp[i-1][j] $$
or similar recurrences.
Careful iteration order allows one-dimensional storage.
This technique appears frequently in knapsack problems.
Example: 0/1 Knapsack
Classical state:
$$ dp[i][w] $$
Meaning:
best value
using first i items
with capacity w
Recurrence:
$$ dp[i][w] = \max ( dp[i-1][w], dp[i-1][w-weight]+value ) $$
Observe that only row:
$$ i-1 $$
is required.
Compress to:
$$ dp[w] $$
Implementation:
for _, item := range items {
for w := capacity;
w >= item.Weight;
w-- {
dp[w] = max(
dp[w],
dp[w-item.Weight] +
item.Value,
)
}
}
Memory becomes:
$$ O(capacity) $$
instead of:
$$ O(n \cdot capacity) $$
Why Reverse Iteration Matters
In 0/1 knapsack:
for w := capacity; w >= weight; w--
must run backward.
Forward iteration would allow the same item to be reused multiple times.
Reverse traversal preserves the original recurrence semantics.
This detail is one of the most common sources of bugs in compressed knapsack implementations.
State Compression Through Observation
Sometimes an entire state dimension is unnecessary.
Consider:
$$ dp[i][selected] $$
Suppose:
selected
can be derived from:
i
and another variable.
Then:
selected
does not need to be stored explicitly.
Eliminating redundant dimensions often yields larger savings than rolling arrays.
Example: Assignment DP
State:
$$ dp[mask] $$
Instead of:
$$ dp[worker][mask] $$
The current worker equals:
bits.OnesCount(mask)
Therefore:
worker
is implicit.
One entire dimension disappears.
Memory decreases dramatically.
Bitset Compression
Some boolean dynamic programs can be represented as bits.
Example:
reachable sums
Traditional DP:
dp[target+1]
Boolean values.
Bitset version:
bitset |= bitset << value
One machine word updates many states simultaneously.
Benefits:
- lower memory usage
- fewer cache misses
- substantial speed improvements
Subset-sum algorithms often use this technique.
Sparse Dynamic Programming
Many DP tables contain mostly unreachable states.
Example:
large coordinate range
few reachable positions
Instead of storing all states:
map[State]int
stores only active states.
This technique trades memory for hash-map overhead.
Sparse representations are common in graph DP and probabilistic state spaces.
Memoization as Compression
Top-down DP naturally stores only visited states.
Suppose:
1,000,000 possible states
but:
5,000 reachable states
Memoization automatically compresses storage.
Bottom-up approaches often allocate all states regardless of reachability.
This is one reason memoization remains attractive for irregular state spaces.
Cache Locality Considerations
Memory optimization is not only about space.
Smaller structures often execute faster.
Reasons:
- fewer cache misses
- better memory bandwidth utilization
- reduced allocation overhead
For large dynamic programs, cache behavior can dominate theoretical complexity.
A compressed (O(n)) memory solution may outperform a larger table by a substantial margin.
Compression and Reconstruction
Compression sometimes removes information needed to reconstruct the solution.
Example:
rolling arrays
preserve values but discard history.
If reconstruction is required:
- store parent pointers separately
- recompute decisions
- store checkpoints
The correct strategy depends on memory constraints.
Optimization should not destroy required outputs.
Example: Edit Distance
Compressed edit distance stores only two rows:
$$ O(m) $$
memory.
However:
actual edit operations
can no longer be reconstructed directly.
To recover the edit script:
- keep the full table
- store parent directions
- use Hirschberg's algorithm
Space optimization and reconstruction often compete.
Common Compression Patterns
Fixed Dependency Window
Examples:
- Fibonacci
- linear recurrences
Memory:
$$ O(1) $$
Rolling Rows
Examples:
- LCS
- edit distance
- grid DP
Memory:
$$ O(m) $$
Dimension Elimination
Examples:
- bitmask DP
- combinatorial DP
Remove redundant state variables.
Bitset Compression
Examples:
- subset sum
- reachability
Pack multiple states into machine words.
Sparse Storage
Examples:
- graph DP
- large coordinate spaces
Store only reachable states.
Common Mistakes
Compressing Too Early
First obtain a correct solution.
Then optimize memory.
Debugging compressed DP is much harder.
Wrong Iteration Direction
Forward versus backward iteration can completely change recurrence semantics.
Verify dependencies carefully.
Losing Reconstruction Information
A compressed table may no longer support solution recovery.
Understand output requirements before optimizing.
Removing Necessary State
Not every dimension is redundant.
If future decisions depend on it, the dimension must remain.
Ignoring Read-After-Write Hazards
Compressed arrays often reuse memory.
Overwriting values too early produces subtle errors.
A Compression Checklist
Before optimizing memory, ask:
- Which states are used by future transitions?
- How many previous rows are required?
- Can a dimension be derived?
- Are states sparse?
- Can boolean states become bitsets?
- Is reconstruction required?
The answers usually reveal the best compression strategy.
Recognizing Compression Opportunities
Strong indicators include:
depends only on previous row
depends only on previous k states
large DP table
memory limit issues
redundant state dimensions
boolean reachability
Whenever a recurrence references only a small neighborhood of states, memory compression should be considered.
Complexity Analysis
State compression usually changes:
$$ \text{Memory} $$
without changing:
$$ \text{Time} $$
Examples:
| Problem | Original Memory | Compressed Memory |
|---|---|---|
| Fibonacci | (O(n)) | (O(1)) |
| LCS | (O(nm)) | (O(m)) |
| Edit Distance | (O(nm)) | (O(m)) |
| 0/1 Knapsack | (O(nW)) | (O(W)) |
The recurrence remains identical.
Only storage changes.
Takeaway
State compression is the art of storing only the information required for future computation. By analyzing dependency windows, eliminating redundant dimensions, rolling rows, exploiting sparsity, and using bit-level representations, many dynamic programming algorithms can reduce memory usage dramatically without changing their fundamental recurrence. Mastering state compression is an essential skill because practical constraints often make memory optimization just as important as time optimization. In many real-world systems, the difference between an elegant theoretical solution and a deployable implementation is the ability to store only what truly matters.