13.22 State Compression and Memory Optimization

One of the most common mistakes in dynamic programming is focusing exclusively on time complexity while ignoring memory consumption. A solution may be theoretically correct and fast enough, yet fail in practice because the DP table is too large.

Fortunately, many dynamic programming algorithms store far more information than they actually need. States are often retained long after they cease to influence future computations. By analyzing dependencies carefully, memory requirements can frequently be reduced by an order of magnitude or more without changing the recurrence itself.

State compression is the process of storing only the information required to continue computation. It is one of the most valuable optimization skills because it transforms elegant theoretical solutions into practical implementations.

Problem

How do we reduce the memory usage of dynamic programming algorithms while preserving correctness?

Understanding Dependency Windows

The key question is:

Which previously computed states are actually needed?

Consider Fibonacci:

$$ dp[i] = dp[i-1] + dp[i-2] $$

A naive implementation stores:

dp[0]
dp[1]
...
dp[n]

Yet the recurrence only depends on the previous two values.

Everything older than:

i-2

is irrelevant.

This observation immediately reduces memory usage.

Example: Fibonacci Compression

Naive implementation:

dp := make([]int, n+1)

dp[0] = 0
dp[1] = 1

for i := 2; i <= n; i++ {
    dp[i] = dp[i-1] + dp[i-2]
}

Memory:

$$ O(n) $$

Compressed version:

a := 0
b := 1

for i := 2; i <= n; i++ {

    c := a + b

    a = b
    b = c
}

Memory:

$$ O(1) $$

The recurrence remains unchanged.

Only storage changes.

The Rolling Array Pattern

Two-dimensional dynamic programming often depends on only the previous row.

Example:

$$ dp[i][j] = f(dp[i-1][j], dp[i-1][j-1]) $$

The current row depends only on:

previous row

Store two rows:

previous := make([]int, m)
current := make([]int, m)

After finishing a row:

previous, current =
    current, previous

Memory decreases from:

$$ O(nm) $$

to:

$$ O(m) $$

This technique is called a rolling array.

Example: Longest Common Subsequence

The classical recurrence:

$$ dp[i][j] $$

depends on:

left
above
diagonal

Only the current row and previous row are needed.

Implementation:

previous := make([]int, m+1)
current := make([]int, m+1)

for i := 1; i <= n; i++ {

    for j := 1; j <= m; j++ {

        if a[i-1] == b[j-1] {

            current[j] =
                previous[j-1] + 1

        } else {

            current[j] =
                max(
                    previous[j],
                    current[j-1],
                )
        }
    }

    previous, current =
        current, previous
}

Memory becomes:

$$ O(m) $$

instead of:

$$ O(nm) $$

Visualizing Row Compression

Original table:


Row 0	•	•	•	•
Row 1	•	•	•	•
Row 2	•	•	•	•
Row 3	•	•	•	•

Suppose computation is currently processing:

Row 3

Rows:

0
1

are already irrelevant.

Only:

Row 2
Row 3

remain necessary.

The rolling array stores exactly these rows.

Single-Row Compression

Sometimes the previous row can be merged into the current row.

Consider:

$$ dp[i][j] = dp[i][j] + dp[i-1][j] $$

or similar recurrences.

Careful iteration order allows one-dimensional storage.

This technique appears frequently in knapsack problems.

Example: 0/1 Knapsack

Classical state:

$$ dp[i][w] $$

Meaning:

best value
using first i items
with capacity w

Recurrence:

$$ dp[i][w] = \max ( dp[i-1][w], dp[i-1][w-weight]+value ) $$

Observe that only row:

$$ i-1 $$

is required.

Compress to:

$$ dp[w] $$

Implementation:

for _, item := range items {

    for w := capacity;
        w >= item.Weight;
        w-- {

        dp[w] = max(
            dp[w],
            dp[w-item.Weight] +
                item.Value,
        )
    }
}

Memory becomes:

$$ O(capacity) $$

instead of:

$$ O(n \cdot capacity) $$

Why Reverse Iteration Matters

In 0/1 knapsack:

for w := capacity; w >= weight; w--

must run backward.

Forward iteration would allow the same item to be reused multiple times.

Reverse traversal preserves the original recurrence semantics.

This detail is one of the most common sources of bugs in compressed knapsack implementations.

State Compression Through Observation

Sometimes an entire state dimension is unnecessary.

Consider:

$$ dp[i][selected] $$

Suppose:

selected

can be derived from:

and another variable.

Then:

selected

does not need to be stored explicitly.

Eliminating redundant dimensions often yields larger savings than rolling arrays.

Example: Assignment DP

State:

$$ dp[mask] $$

Instead of:

$$ dp[worker][mask] $$

The current worker equals:

bits.OnesCount(mask)

Therefore:

worker

is implicit.

One entire dimension disappears.

Memory decreases dramatically.

Bitset Compression

Some boolean dynamic programs can be represented as bits.

Example:

reachable sums

Traditional DP:

dp[target+1]

Boolean values.

Bitset version:

bitset |= bitset << value

One machine word updates many states simultaneously.

Benefits:

lower memory usage
fewer cache misses
substantial speed improvements

Subset-sum algorithms often use this technique.

Sparse Dynamic Programming

Many DP tables contain mostly unreachable states.

Example:

large coordinate range

few reachable positions

Instead of storing all states:

map[State]int

stores only active states.

This technique trades memory for hash-map overhead.

Sparse representations are common in graph DP and probabilistic state spaces.

Memoization as Compression

Top-down DP naturally stores only visited states.

Suppose:

1,000,000 possible states

but:

5,000 reachable states

Memoization automatically compresses storage.

Bottom-up approaches often allocate all states regardless of reachability.

This is one reason memoization remains attractive for irregular state spaces.

Cache Locality Considerations

Memory optimization is not only about space.

Smaller structures often execute faster.

Reasons:

fewer cache misses
better memory bandwidth utilization
reduced allocation overhead

For large dynamic programs, cache behavior can dominate theoretical complexity.

A compressed (O(n)) memory solution may outperform a larger table by a substantial margin.

Compression and Reconstruction

Compression sometimes removes information needed to reconstruct the solution.

Example:

rolling arrays

preserve values but discard history.

If reconstruction is required:

store parent pointers separately
recompute decisions
store checkpoints

The correct strategy depends on memory constraints.

Optimization should not destroy required outputs.

Example: Edit Distance

Compressed edit distance stores only two rows:

$$ O(m) $$

memory.

However:

actual edit operations

can no longer be reconstructed directly.

To recover the edit script:

keep the full table
store parent directions
use Hirschberg's algorithm

Space optimization and reconstruction often compete.

Common Compression Patterns

Fixed Dependency Window

Examples:

Fibonacci
linear recurrences

Memory:

$$ O(1) $$

Rolling Rows

Examples:

LCS
edit distance
grid DP

Memory:

$$ O(m) $$

Dimension Elimination

Examples:

bitmask DP
combinatorial DP

Remove redundant state variables.

Bitset Compression

Examples:

subset sum
reachability

Pack multiple states into machine words.

Sparse Storage

Examples:

graph DP
large coordinate spaces

Store only reachable states.

Common Mistakes

Compressing Too Early

First obtain a correct solution.

Then optimize memory.

Debugging compressed DP is much harder.

Wrong Iteration Direction

Forward versus backward iteration can completely change recurrence semantics.

Verify dependencies carefully.

Losing Reconstruction Information

A compressed table may no longer support solution recovery.

Understand output requirements before optimizing.

Removing Necessary State

Not every dimension is redundant.

If future decisions depend on it, the dimension must remain.

Ignoring Read-After-Write Hazards

Compressed arrays often reuse memory.

Overwriting values too early produces subtle errors.

A Compression Checklist

Before optimizing memory, ask:

Which states are used by future transitions?
How many previous rows are required?
Can a dimension be derived?
Are states sparse?
Can boolean states become bitsets?
Is reconstruction required?

The answers usually reveal the best compression strategy.

Recognizing Compression Opportunities

Strong indicators include:

depends only on previous row

depends only on previous k states

large DP table

memory limit issues

redundant state dimensions

boolean reachability

Whenever a recurrence references only a small neighborhood of states, memory compression should be considered.

Complexity Analysis

State compression usually changes:

$$ \text{Memory} $$

without changing:

$$ \text{Time} $$

Examples:

Problem	Original Memory	Compressed Memory
Fibonacci	(O(n))	(O(1))
LCS	(O(nm))	(O(m))
Edit Distance	(O(nm))	(O(m))
0/1 Knapsack	(O(nW))	(O(W))

The recurrence remains identical.

Only storage changes.

Takeaway

State compression is the art of storing only the information required for future computation. By analyzing dependency windows, eliminating redundant dimensions, rolling rows, exploiting sparsity, and using bit-level representations, many dynamic programming algorithms can reduce memory usage dramatically without changing their fundamental recurrence. Mastering state compression is an essential skill because practical constraints often make memory optimization just as important as time optimization. In many real-world systems, the difference between an elegant theoretical solution and a deployable implementation is the ability to store only what truly matters.