19.1 Random Variables in Algorithms

Randomized algorithms differ from deterministic algorithms because their behavior depends on random choices made during execution. A deterministic algorithm always follows the same sequence of operations for a given input. A randomized algorithm may produce different execution paths, different running times, or even different outputs when executed multiple times on the same input.

To analyze such algorithms rigorously, we need a mathematical framework that describes uncertainty. Random variables provide that framework. They allow us to convert random events into numerical quantities that can be measured, analyzed, and compared.

This section introduces the fundamental concepts of random variables, expectation, variance, and probability distributions from an algorithmic perspective. These concepts form the foundation for every topic in this chapter, including randomized sorting, probabilistic data structures, streaming algorithms, approximation algorithms, and online decision-making.

Problem

You need to analyze an algorithm whose behavior depends on random choices.

Questions typically include:

What is the expected running time?
How likely is a failure?
How much memory will be used on average?
How far can the result deviate from the average?
How often does the worst case occur?

Random variables provide precise answers to these questions.

Recipe: Modeling Random Outcomes

Consider a simple experiment.

A fair six-sided die is rolled once.

Possible outcomes are:

Outcome
1
2
3
4
5
6

Define a random variable:

X = value shown on the die

Now the random experiment becomes a numerical quantity.

Outcome	X
1	1
2	2
3	3
4	4
5	5
6	6

Each value occurs with probability:

P(X = k) = 1/6

for:

k ∈ {1,2,3,4,5,6}

The random variable converts uncertainty into numbers that can be analyzed mathematically.

Random Variables in Algorithms

Suppose we randomly choose an element from an array.

import random

array = [10, 20, 30, 40, 50]

x = random.choice(array)

Define:

X = selected value

Possible values:

10, 20, 30, 40, 50

Each value occurs with probability:

1/5

Now questions about the algorithm become questions about the random variable.

Examples:

What is the average selected value?

How often is a value greater than 30 selected?

How much variation exists?

These are probabilistic questions.

Indicator Random Variables

Indicator variables are among the most useful tools in algorithm analysis.

An indicator variable represents whether an event occurs.

I =
    1 if event occurs
    0 otherwise

Example:

A coin is flipped.

Define:

I =
    1 if heads
    0 if tails

Then:

Outcome	I
Heads	1
Tails	0

Indicators simplify many proofs because counting becomes summation.

Suppose we flip 100 coins.

Let:

Ii =
    1 if coin i is heads
    0 otherwise

Total heads:

H = I1 + I2 + ... + I100

A counting problem becomes an algebra problem.

This technique appears constantly in randomized algorithm analysis.

Expected Value

Expectation measures the long-run average value of a random variable.

For discrete variables:

E[X] = Σ x · P(X=x)

For a fair die:

E[X]
=
1·(1/6)
+
2·(1/6)
+
3·(1/6)
+
4·(1/6)
+
5·(1/6)
+
6·(1/6)

Result:

E[X] = 3.5

The die never shows 3.5.

Expectation describes the average over many repetitions.

In algorithm analysis, expectation is often more important than worst-case behavior.

Example: Random Array Access

Suppose:

array = [2, 4, 6, 8]

A random index is selected.

Define:

X = selected value

Expected value:

E[X]
=
(2+4+6+8)/4
=
5

The average selected value equals 5.

Linearity of Expectation

This is one of the most powerful tools in algorithm analysis.

For any random variables:

E[X + Y]
=
E[X] + E[Y]

More generally:

E[X1 + X2 + ... + Xn]
=
E[X1]
+
E[X2]
+
...
+
E[Xn]

No independence assumption is required.

This fact makes many difficult counting problems easy.

Example

Flip 100 fair coins.

Define:

Ii =
    1 if coin i is heads

Expected value:

E[Ii] = 1/2

Total heads:

H = Σ Ii

Therefore:

E[H]
=
Σ E[Ii]
=
100 × 1/2
=
50

No complicated probability calculations are needed.

Variance

Expectation measures the center of a distribution.

Variance measures spread.

Definition:

Var(X)
=
E[(X-E[X])²]

Small variance:

Values stay near the average.

Large variance:

Values fluctuate significantly.

Example:

Consider two algorithms.

Algorithm A:

Always takes 100 ms.

Algorithm B:

50 ms half the time
150 ms half the time

Both have:

Expected runtime = 100 ms

Their variances differ significantly.

Variance captures this difference.

Standard Deviation

Standard deviation is:

σ = √Var(X)

The value has the same units as the original measurement.

For running time analysis, standard deviation often provides more intuitive information than variance.

Example: Randomized Quicksort

Randomized quicksort chooses a pivot uniformly at random.

Define:

X = running time

The running time varies across executions.

Some pivots produce balanced partitions.

Others produce poor partitions.

Analysis shows:

E[X] = O(n log n)

This expected value is the primary reason randomized quicksort is widely used.

The worst case remains:

O(n²)

but occurs with very low probability.

Probability Distribution

A probability distribution specifies how probabilities are assigned to values.

Common distributions in algorithm analysis include:

Distribution	Typical Use
Bernoulli	Success or failure
Binomial	Repeated independent events
Uniform	Random selection
Geometric	Repeated trials until success
Poisson	Rare event counts
Normal	Aggregated random effects

Each appears naturally in different algorithmic contexts.

Independence

Two random variables are independent when one does not affect the other.

Example:

Two separate coin flips

are independent.

In contrast:

Drawing cards without replacement

creates dependence.

Many algorithm analyses become simpler when independence exists.

However, modern randomized algorithms often require techniques that work even when variables are dependent.

Recipe: Analyzing a Randomized Algorithm

When facing a randomized algorithm:

Step 1

Identify random choices.

Example:

Random pivot
Random sample
Random hash function

Step 2

Define random variables.

Example:

X = running time

X = number of collisions

Step 3

Compute expectation.

E[X]

Step 4

Compute variance if stability matters.

Var(X)

Step 5

Bound failure probability.

P(X > threshold)

Step 6

Interpret the result in algorithmic terms.

Complexity Perspective

Randomized algorithms often replace deterministic guarantees with probabilistic guarantees.

Instead of:

Runtime ≤ O(n log n)

we may prove:

Expected runtime = O(n log n)

Instead of:

Always correct

we may prove:

Correct with probability 0.999999

This shift from certainty to probability allows algorithms that are often simpler, faster, and more scalable.

Discussion

Random variables are the language of randomized algorithm analysis. They transform uncertain behavior into measurable quantities. Expectation predicts average performance. Variance measures stability. Probability distributions describe possible outcomes. Indicator variables simplify counting arguments. Linearity of expectation turns many complex analyses into straightforward calculations.

Nearly every randomized algorithm introduced later in this chapter can be understood as a sequence of random variables whose expectations, variances, and probabilities determine the algorithm's practical behavior. Before studying randomized quicksort, Bloom filters, MinHash, HyperLogLog, streaming sketches, approximation algorithms, or online algorithms, a firm grasp of random variables provides the mathematical foundation required to analyze them rigorously.