13.14 Probability Dynamic Programming

Probability dynamic programming appears when each transition has uncertainty. Instead of computing the best value, the number of ways, or the feasibility of a state, we compute a probability, an expected value, or a distribution over outcomes.

The structure is still dynamic programming. We define states, derive transitions, initialize base cases, and compute answers in dependency order. The difference is the algebra used to combine transitions. Counting DP uses addition. Optimization DP uses min or max. Probability DP uses weighted sums.

This technique is common in games of chance, randomized algorithms, Markov chains, reliability analysis, inventory systems, queueing models, and expected-cost planning. It is also useful in interview and contest problems where the word "random" appears in the statement.

Problem

How do we design dynamic programming algorithms when transitions happen with known probabilities?

Understanding Probabilistic States

A typical probability DP state has the form:

$$ dp[state] $$

with one of several meanings:

probability of reaching this state

probability of eventually winning from this state

expected number of steps from this state

expected reward starting from this state

The state definition must say exactly which quantity is stored. A probability and an expectation behave differently, even when the state space is identical.

For example:

$$ dp[i] $$

may mean:

probability of being at position i after k moves

or:

expected number of moves needed to reach position i

Those are different problems and lead to different recurrences.

Probability Mass DP

The simplest form tracks how probability mass moves through states.

Suppose a token starts at position 0. At each step, it moves either:

+1 with probability 1/2

+2 with probability 1/2

We want the probability of reaching each position after exactly t moves.

State:

$$ dp[step][pos] $$

Meaning:

probability of being at position pos
after exactly step moves

Base case:

$$ dp[0][0] = 1 $$

Transition:

$$ dp[step+1][pos+1] += dp[step][pos] \times \frac12 $$

$$ dp[step+1][pos+2] += dp[step][pos] \times \frac12 $$

This style pushes probability forward.

Implementation

func PositionProbabilities(steps int, maxPos int) [][]float64 {

    dp := make([][]float64, steps+1)

    for i := range dp {
        dp[i] = make([]float64, maxPos+1)
    }

    dp[0][0] = 1.0

    for step := 0; step < steps; step++ {

        for pos := 0; pos <= maxPos; pos++ {

            p := dp[step][pos]

            if p == 0 {
                continue
            }

            if pos+1 <= maxPos {
                dp[step+1][pos+1] += p * 0.5
            }

            if pos+2 <= maxPos {
                dp[step+1][pos+2] += p * 0.5
            }
        }
    }

    return dp
}

This is the probability analogue of path counting.

Instead of adding integer counts, we add fractional probability mass.

Expected Value DP

Expected value problems ask for an average outcome.

Examples:

expected number of dice rolls

expected cost to finish a process

expected reward from a randomized game

The recurrence usually follows:

$$ E[state] = baseCost + \sum p_i \times E[next_i] $$

Each possible transition contributes according to its probability.

Example: Expected Dice Rolls to Reach N

Suppose a counter starts at 0. Each move rolls a fair six-sided die and advances by the result. We want the expected number of rolls needed to reach or pass n.

State:

$$ dp[i] $$

Meaning:

expected rolls needed
starting from position i

If:

$$ i \ge n $$

then:

$$ dp[i] = 0 $$

because the target has already been reached.

Otherwise, one roll is made, then the process continues from one of six possible positions:

$$ dp[i] = 1 + \frac16 \sum_{d=1}^{6} dp[i+d] $$

This recurrence computes the expected remaining work.

Implementation

func ExpectedRolls(n int) float64 {

    dp := make([]float64, n+6)

    for i := n - 1; i >= 0; i-- {

        dp[i] = 1.0

        for d := 1; d <= 6; d++ {
            dp[i] += dp[i+d] / 6.0
        }
    }

    return dp[0]
}

The loop runs backward because dp[i] depends on larger positions.

Probability of Winning

Many game problems ask:

What is the probability of winning from this state?

Suppose a game ends when the score reaches n. From score i, the player gains either 1 or 2 points with equal probability.

State:

$$ dp[i] $$

Meaning:

probability of eventually reaching exactly n
starting from score i

Base cases:

$$ dp[n] = 1 $$

$$ dp[i] = 0 \quad \text{for } i > n $$

Transition:

$$ dp[i] = \frac12 dp[i+1] + \frac12 dp[i+2] $$

Implementation:

func WinProbability(n int) float64 {

    dp := make([]float64, n+2)

    dp[n] = 1.0

    for i := n - 1; i >= 0; i-- {
        dp[i] = 0.5*dp[i+1] + 0.5*dp[i+2]
    }

    return dp[0]
}

The recurrence is nearly identical to a counting recurrence, but the coefficients are probabilities.

Deriving Probability Recurrences

A useful method is:

Define the random variable or probability.
List every possible next event.
Assign each event a probability.
Write the contribution of each event.
Sum all contributions.

For expected value:

expected value = weighted average of future expected values

For probability of success:

success probability = weighted average of future success probabilities

This process is mechanical once the state is correct.

Absorbing States

Many probability DP problems have absorbing states.

An absorbing state is a state where the process stops.

Examples:

game won

game lost

target reached

system failed

These states become base cases.

For example:

$$ dp[win] = 1 $$

$$ dp[lose] = 0 $$

For expected steps:

$$ dp[finished] = 0 $$

Incorrect absorbing states are one of the most common causes of wrong answers.

Forward Versus Backward Probability DP

There are two common styles.

Forward DP distributes probability mass:

current state -> next states

Backward DP computes value from future states:

state <- future states

Forward style is natural for questions like:

Where can we be after k steps?

Backward style is natural for questions like:

What happens eventually from here?

Choosing the right direction usually makes the implementation simpler.

Example: Random Walk with Boundaries

A token starts at position i on a line from 0 to n.

At each step:

move left with probability p

move right with probability 1-p

States 0 and n are absorbing.

Question:

What is the probability of reaching n before 0?

State:

$$ dp[i] $$

Meaning:

probability of reaching n
before 0
starting from i

Base cases:

$$ dp[0] = 0 $$

$$ dp[n] = 1 $$

Transition:

$$ dp[i] = p \cdot dp[i-1] + (1-p) \cdot dp[i+1] $$

This recurrence contains both previous and next positions. It defines a system of equations rather than a simple acyclic DP.

For many contest problems, boundaries or step limits make the dependency graph acyclic. When cycles remain, solving linear equations may be required.

When DP Becomes a Linear System

Expected value recurrences can contain self-dependencies.

Example:

With probability p, stay in the same state.

Then:

$$ E[i] = 1 + pE[i] + (1-p)E[i+1] $$

This can be rearranged:

$$ E[i] = \frac{1 + (1-p)E[i+1]}{1-p} $$

If many states depend cyclically on one another, the problem becomes a system of linear equations.

This is still related to dynamic programming, but implementation requires algebraic solving rather than a simple loop.

Numerical Precision

Probability DP usually uses floating-point numbers.

Practical issues include:

rounding error

accumulated error

comparison tolerance

underflow for tiny probabilities

Avoid exact equality comparisons on floating-point results.

Use a tolerance:

const eps = 1e-9

when comparing computed probabilities.

For modular arithmetic problems, probabilities are often represented as fractions under a prime modulus, using modular inverses instead of floating-point numbers.

Common Mistakes

Forgetting Probabilities Must Sum to One

Transitions from a state should usually form a complete probability distribution.

If probabilities sum to less than one or more than one, the recurrence is suspect unless the missing probability intentionally represents termination.

Mixing Probability and Count

Counting paths and computing probabilities look similar, but they are not the same.

For probability, each transition must be multiplied by its likelihood.

Wrong Base Case Semantics

For success probability:

win = 1
lose = 0

For expected steps:

finished = 0

Do not reuse base cases across different meanings.

Incorrect Iteration Order

If a state depends on future states, iterate backward.

If probability mass is pushed forward over time, iterate forward.

Ignoring Cycles

Not all probabilistic processes produce acyclic recurrences.

Cycles require algebraic treatment or iterative approximation.

Complexity Analysis

Let:

$$ S $$

be the number of states.

Let:

$$ K $$

be the number of possible transitions per state.

For acyclic probability DP:

$$ O(SK) $$

time.

Memory:

$$ O(S) $$

or less when rolling arrays apply.

For cyclic systems, complexity depends on the method used to solve the equations.

Recognizing Probability DP

Strong indicators include:

random choice

expected number of steps

probability of success

dice rolls

coin flips

Markov process

absorbing states

weighted transitions

If the process has repeated states and known transition probabilities, probability DP is likely applicable.

Takeaway

Probability dynamic programming extends standard DP by replacing deterministic transitions with weighted outcomes. States store probabilities, expectations, or distributions, and recurrences combine future values using transition probabilities. The same core discipline still applies: define the state precisely, identify base cases, derive transitions from the process rules, and compute states in a valid order. When the state graph is acyclic, the result is a clean dynamic program. When cycles appear, the recurrence may become a system of equations, but the modeling principles remain the same.