13.14 Probability Dynamic Programming
Probability dynamic programming appears when each transition has uncertainty.
13.14 Probability Dynamic Programming
Probability dynamic programming appears when each transition has uncertainty. Instead of computing the best value, the number of ways, or the feasibility of a state, we compute a probability, an expected value, or a distribution over outcomes.
The structure is still dynamic programming. We define states, derive transitions, initialize base cases, and compute answers in dependency order. The difference is the algebra used to combine transitions. Counting DP uses addition. Optimization DP uses min or max. Probability DP uses weighted sums.
This technique is common in games of chance, randomized algorithms, Markov chains, reliability analysis, inventory systems, queueing models, and expected-cost planning. It is also useful in interview and contest problems where the word "random" appears in the statement.
Problem
How do we design dynamic programming algorithms when transitions happen with known probabilities?
Understanding Probabilistic States
A typical probability DP state has the form:
$$ dp[state] $$
with one of several meanings:
probability of reaching this state
probability of eventually winning from this state
expected number of steps from this state
expected reward starting from this state
The state definition must say exactly which quantity is stored. A probability and an expectation behave differently, even when the state space is identical.
For example:
$$ dp[i] $$
may mean:
probability of being at position i after k moves
or:
expected number of moves needed to reach position i
Those are different problems and lead to different recurrences.
Probability Mass DP
The simplest form tracks how probability mass moves through states.
Suppose a token starts at position 0. At each step, it moves either:
+1 with probability 1/2
+2 with probability 1/2
We want the probability of reaching each position after exactly t moves.
State:
$$ dp[step][pos] $$
Meaning:
probability of being at position pos
after exactly step moves
Base case:
$$ dp[0][0] = 1 $$
Transition:
$$ dp[step+1][pos+1] += dp[step][pos] \times \frac12 $$
$$ dp[step+1][pos+2] += dp[step][pos] \times \frac12 $$
This style pushes probability forward.
Implementation
func PositionProbabilities(steps int, maxPos int) [][]float64 {
dp := make([][]float64, steps+1)
for i := range dp {
dp[i] = make([]float64, maxPos+1)
}
dp[0][0] = 1.0
for step := 0; step < steps; step++ {
for pos := 0; pos <= maxPos; pos++ {
p := dp[step][pos]
if p == 0 {
continue
}
if pos+1 <= maxPos {
dp[step+1][pos+1] += p * 0.5
}
if pos+2 <= maxPos {
dp[step+1][pos+2] += p * 0.5
}
}
}
return dp
}
This is the probability analogue of path counting.
Instead of adding integer counts, we add fractional probability mass.
Expected Value DP
Expected value problems ask for an average outcome.
Examples:
expected number of dice rolls
expected cost to finish a process
expected reward from a randomized game
The recurrence usually follows:
$$ E[state] = baseCost + \sum p_i \times E[next_i] $$
Each possible transition contributes according to its probability.
Example: Expected Dice Rolls to Reach N
Suppose a counter starts at 0. Each move rolls a fair six-sided die and advances by the result. We want the expected number of rolls needed to reach or pass n.
State:
$$ dp[i] $$
Meaning:
expected rolls needed
starting from position i
If:
$$ i \ge n $$
then:
$$ dp[i] = 0 $$
because the target has already been reached.
Otherwise, one roll is made, then the process continues from one of six possible positions:
$$ dp[i] = 1 + \frac16 \sum_{d=1}^{6} dp[i+d] $$
This recurrence computes the expected remaining work.
Implementation
func ExpectedRolls(n int) float64 {
dp := make([]float64, n+6)
for i := n - 1; i >= 0; i-- {
dp[i] = 1.0
for d := 1; d <= 6; d++ {
dp[i] += dp[i+d] / 6.0
}
}
return dp[0]
}
The loop runs backward because dp[i] depends on larger positions.
Probability of Winning
Many game problems ask:
What is the probability of winning from this state?
Suppose a game ends when the score reaches n. From score i, the player gains either 1 or 2 points with equal probability.
State:
$$ dp[i] $$
Meaning:
probability of eventually reaching exactly n
starting from score i
Base cases:
$$ dp[n] = 1 $$
$$ dp[i] = 0 \quad \text{for } i > n $$
Transition:
$$ dp[i] = \frac12 dp[i+1] + \frac12 dp[i+2] $$
Implementation:
func WinProbability(n int) float64 {
dp := make([]float64, n+2)
dp[n] = 1.0
for i := n - 1; i >= 0; i-- {
dp[i] = 0.5*dp[i+1] + 0.5*dp[i+2]
}
return dp[0]
}
The recurrence is nearly identical to a counting recurrence, but the coefficients are probabilities.
Deriving Probability Recurrences
A useful method is:
- Define the random variable or probability.
- List every possible next event.
- Assign each event a probability.
- Write the contribution of each event.
- Sum all contributions.
For expected value:
expected value = weighted average of future expected values
For probability of success:
success probability = weighted average of future success probabilities
This process is mechanical once the state is correct.
Absorbing States
Many probability DP problems have absorbing states.
An absorbing state is a state where the process stops.
Examples:
game won
game lost
target reached
system failed
These states become base cases.
For example:
$$ dp[win] = 1 $$
$$ dp[lose] = 0 $$
For expected steps:
$$ dp[finished] = 0 $$
Incorrect absorbing states are one of the most common causes of wrong answers.
Forward Versus Backward Probability DP
There are two common styles.
Forward DP distributes probability mass:
current state -> next states
Backward DP computes value from future states:
state <- future states
Forward style is natural for questions like:
Where can we be after k steps?
Backward style is natural for questions like:
What happens eventually from here?
Choosing the right direction usually makes the implementation simpler.
Example: Random Walk with Boundaries
A token starts at position i on a line from 0 to n.
At each step:
move left with probability p
move right with probability 1-p
States 0 and n are absorbing.
Question:
What is the probability of reaching n before 0?
State:
$$ dp[i] $$
Meaning:
probability of reaching n
before 0
starting from i
Base cases:
$$ dp[0] = 0 $$
$$ dp[n] = 1 $$
Transition:
$$ dp[i] = p \cdot dp[i-1] + (1-p) \cdot dp[i+1] $$
This recurrence contains both previous and next positions. It defines a system of equations rather than a simple acyclic DP.
For many contest problems, boundaries or step limits make the dependency graph acyclic. When cycles remain, solving linear equations may be required.
When DP Becomes a Linear System
Expected value recurrences can contain self-dependencies.
Example:
With probability p, stay in the same state.
Then:
$$ E[i] = 1 + pE[i] + (1-p)E[i+1] $$
This can be rearranged:
$$ E[i] = \frac{1 + (1-p)E[i+1]}{1-p} $$
If many states depend cyclically on one another, the problem becomes a system of linear equations.
This is still related to dynamic programming, but implementation requires algebraic solving rather than a simple loop.
Numerical Precision
Probability DP usually uses floating-point numbers.
Practical issues include:
rounding error
accumulated error
comparison tolerance
underflow for tiny probabilities
Avoid exact equality comparisons on floating-point results.
Use a tolerance:
const eps = 1e-9
when comparing computed probabilities.
For modular arithmetic problems, probabilities are often represented as fractions under a prime modulus, using modular inverses instead of floating-point numbers.
Common Mistakes
Forgetting Probabilities Must Sum to One
Transitions from a state should usually form a complete probability distribution.
If probabilities sum to less than one or more than one, the recurrence is suspect unless the missing probability intentionally represents termination.
Mixing Probability and Count
Counting paths and computing probabilities look similar, but they are not the same.
For probability, each transition must be multiplied by its likelihood.
Wrong Base Case Semantics
For success probability:
win = 1
lose = 0
For expected steps:
finished = 0
Do not reuse base cases across different meanings.
Incorrect Iteration Order
If a state depends on future states, iterate backward.
If probability mass is pushed forward over time, iterate forward.
Ignoring Cycles
Not all probabilistic processes produce acyclic recurrences.
Cycles require algebraic treatment or iterative approximation.
Complexity Analysis
Let:
$$ S $$
be the number of states.
Let:
$$ K $$
be the number of possible transitions per state.
For acyclic probability DP:
$$ O(SK) $$
time.
Memory:
$$ O(S) $$
or less when rolling arrays apply.
For cyclic systems, complexity depends on the method used to solve the equations.
Recognizing Probability DP
Strong indicators include:
random choice
expected number of steps
probability of success
dice rolls
coin flips
Markov process
absorbing states
weighted transitions
If the process has repeated states and known transition probabilities, probability DP is likely applicable.
Takeaway
Probability dynamic programming extends standard DP by replacing deterministic transitions with weighted outcomes. States store probabilities, expectations, or distributions, and recurrences combine future values using transition probabilities. The same core discipline still applies: define the state precisely, identify base cases, derive transitions from the process rules, and compute states in a valid order. When the state graph is acyclic, the result is a clean dynamic program. When cycles appear, the recurrence may become a system of equations, but the modeling principles remain the same.