19.22 Probability Bounds

Randomized algorithms rarely stop at computing an expected value. An expected value tells us what happens on average, but it does not say how often the algorithm deviates from that average. For production systems, that missing information matters.

If a randomized algorithm has expected runtime:

O(n log n)

we may still ask:

How likely is a much slower run?

If a sketch estimates frequency with expected error:

ε

we may still ask:

How often does the error exceed the tolerance?

Probability bounds answer these questions. They convert randomness into operational guarantees.

Problem

Suppose a randomized algorithm has runtime variable:

and expected runtime:

E[T] = 100 ms

That does not mean every run takes close to 100 ms.

Possible outcomes:

Probability	Runtime
0.99	50 ms
0.01	5050 ms

Expected runtime is still about:

100 ms

but one percent of requests are very slow.

A useful analysis needs tail bounds:

P(T ≥ threshold)

or:

P(|X - E[X]| ≥ threshold)

Markov's Inequality

Markov's inequality is the simplest probability bound.

If X is a nonnegative random variable, then:

P(X ≥ a) ≤ E[X] / a

for any:

a > 0

Example:

E[T] = 100 ms

Then:

P(T ≥ 1000 ms) ≤ 100 / 1000 = 0.1

So the probability of taking at least one second is at most 10%.

This bound is often loose, but it requires very little information.

When to Use Markov

Use Markov when you know only:

X ≥ 0

and:

E[X]

It is useful for first-pass analysis and for algorithms where stronger assumptions are unavailable.

It is usually not the final word. If you know variance or independence structure, stronger bounds are available.

Chebyshev's Inequality

Chebyshev's inequality uses variance.

For a random variable X with finite mean and variance:

P(|X - E[X]| ≥ t) ≤ Var(X) / t²

Example:

E[X] = 100
Var(X) = 25

Question:

P(|X - 100| ≥ 20)

Bound:

25 / 20² = 25 / 400 = 0.0625

So the probability of being at least 20 away from the mean is at most:

6.25%

Chebyshev is stronger than Markov when variance is small.

When to Use Chebyshev

Use Chebyshev when you know:

E[X]

and:

Var(X)

but cannot assume independence or a specific distribution.

This appears in:

Variance analysis of estimators
Hashing analyses
Sampling estimates
Randomized load measurements

Chebyshev is general, but often conservative.

Chernoff Bounds

Chernoff bounds are much sharper for sums of independent random variables.

Suppose:

X = X₁ + X₂ + ... + Xₙ

where each Xᵢ is an independent 0/1 random variable.

Let:

μ = E[X]

A common upper-tail Chernoff bound is:

P(X ≥ (1 + δ)μ) ≤ exp(-δ²μ / 3)

for:

0 < δ ≤ 1

A common lower-tail bound is:

P(X ≤ (1 - δ)μ) ≤ exp(-δ²μ / 2)

for:

0 < δ < 1

These bounds shrink exponentially with the expectation.

Example: Sampling Error

Suppose we sample users and define:

Xᵢ = 1 if sampled user clicked
Xᵢ = 0 otherwise

Let:

X = total clicks in sample

If:

E[X] = 1000

What is the probability of observing at least 20% more than expected?

Here:

δ = 0.2

Chernoff upper-tail bound:

P(X ≥ 1200) ≤ exp(-(0.2)² · 1000 / 3)

That is:

exp(-13.333...)
≈ 0.00000162

This is far stronger than Markov or Chebyshev for independent Bernoulli trials.

Hoeffding's Inequality

Hoeffding's inequality applies to bounded independent random variables.

If:

X₁, X₂, ..., Xₙ

are independent and each lies in:

[aᵢ, bᵢ]

then it bounds deviation of the sample mean from the true mean.

A common version for variables in [0,1] is:

P(|sample_mean - true_mean| ≥ ε)
≤
2 exp(-2nε²)

This is useful for estimating averages from samples.

Example: Estimating a Click Rate

Suppose we estimate a click rate from:

n = 10,000

samples.

We want error at most:

ε = 0.02

Hoeffding gives:

P(error ≥ 0.02)
≤
2 exp(-2 · 10000 · 0.02²)

Compute:

2 exp(-8)
≈ 0.000671

So the estimate is within 2 percentage points with probability at least approximately:

99.93%

under the independence and boundedness assumptions.

Union Bound

The union bound is simple and frequently useful.

For events:

A₁, A₂, ..., Aₙ

we have:

P(A₁ ∪ A₂ ∪ ... ∪ Aₙ)
≤
P(A₁) + P(A₂) + ... + P(Aₙ)

No independence is required.

Example:

If 100 failure events each have probability at most:

10⁻⁶

then probability that at least one occurs is at most:

100 · 10⁻⁶ = 10⁻⁴

The bound may be loose, but it is robust.

Why the Union Bound Matters

Many algorithms have many possible failure events.

Examples:

One hash bucket becomes too large.
One sampled estimate deviates too far.
One constraint remains unsatisfied.
One server becomes overloaded.
One sketch query exceeds its error tolerance.

The union bound lets you control the probability of any bad event occurring.

Example: Randomized Rounding

Suppose randomized rounding has:

1000 constraints

Each constraint fails with probability at most:

10⁻⁶

Then:

P(any constraint fails) ≤ 1000 · 10⁻⁶ = 0.001

So all constraints hold with probability at least:

0.999

This kind of reasoning is standard in randomized approximation algorithms.

Median Trick

Some estimators have a small probability of large error.

A common amplification technique:

Run the estimator several independent times.
Return the median result.

This reduces the probability that outliers dominate.

Suppose each estimator is correct with probability at least:

2/3

Run it:

times independently.

The median fails only if more than half the runs fail.

Chernoff bounds can show that failure probability decreases exponentially in k.

This is widely used in sketching and randomized numerical algorithms.

Hashing Example

Suppose a randomized hash table has expected chain length:

α

Markov gives a basic bound:

P(chain length ≥ 10α) ≤ 1/10

This may be enough for a rough argument.

If the hashing scheme provides stronger independence, Chernoff-style arguments can prove much sharper bounds on bucket sizes.

The strength of the probability bound depends on the strength of the independence assumption.

Independence Assumptions

Many mistakes in randomized analysis come from assuming independence where it does not exist.

Independent events:

Two separate fair coin flips

Dependent events:

Drawing two cards without replacement

Pairwise independent:

Any two variables are independent,
but larger groups may not be.

Some hash families provide only pairwise independence. That may be enough for expectation and variance, but not enough for Chernoff bounds.

Always check the assumption required by the bound.

Comparison of Bounds

Bound	Requires	Best For
Markov	Nonnegative variable and expectation	Very general rough bound
Chebyshev	Mean and variance	Deviations with known variance
Chernoff	Independent 0/1 variables	Sharp bounds for sums
Hoeffding	Independent bounded variables	Sample means
Union Bound	Individual event probabilities	Many failure events

The art is choosing the weakest bound that gives a useful guarantee.

Practical Error Budgets

In engineering, probability bounds become error budgets.

Example:

Sketch query error above tolerance:
≤ 10⁻⁶

Daily number of sketch queries:
10⁵

Union bound:

Daily probability of at least one bad query
≤ 10⁵ · 10⁻⁶ = 0.1

That may be too high.

To reduce it, lower per-query failure probability, reduce query volume, or change the guarantee being measured.

Common Mistakes

Do not use expected value as if it were a high-probability guarantee.

Do not apply Chernoff bounds without independence or the required boundedness assumptions.

Do not ignore the number of failure events when many queries or constraints are involved.

Do not mistake a loose upper bound for the true probability.

Do not mix average-case input assumptions with algorithmic randomness unless the model is explicit.

Takeaway

Probability bounds turn randomized algorithms from intuitive techniques into rigorous tools. Expectation tells you the average. Variance tells you spread. Tail bounds tell you how often bad deviations occur. Markov, Chebyshev, Chernoff, Hoeffding, and the union bound form a practical toolkit for proving that randomized algorithms are not merely fast or accurate on average, but reliable with quantifiable probability.