19.22 Probability Bounds
Randomized algorithms rarely stop at computing an expected value.
19.22 Probability Bounds
Randomized algorithms rarely stop at computing an expected value. An expected value tells us what happens on average, but it does not say how often the algorithm deviates from that average. For production systems, that missing information matters.
If a randomized algorithm has expected runtime:
O(n log n)
we may still ask:
How likely is a much slower run?
If a sketch estimates frequency with expected error:
ε
we may still ask:
How often does the error exceed the tolerance?
Probability bounds answer these questions. They convert randomness into operational guarantees.
Problem
Suppose a randomized algorithm has runtime variable:
T
and expected runtime:
E[T] = 100 ms
That does not mean every run takes close to 100 ms.
Possible outcomes:
| Probability | Runtime |
|---|---|
| 0.99 | 50 ms |
| 0.01 | 5050 ms |
Expected runtime is still about:
100 ms
but one percent of requests are very slow.
A useful analysis needs tail bounds:
P(T ≥ threshold)
or:
P(|X - E[X]| ≥ threshold)
Markov's Inequality
Markov's inequality is the simplest probability bound.
If X is a nonnegative random variable, then:
P(X ≥ a) ≤ E[X] / a
for any:
a > 0
Example:
E[T] = 100 ms
Then:
P(T ≥ 1000 ms) ≤ 100 / 1000 = 0.1
So the probability of taking at least one second is at most 10%.
This bound is often loose, but it requires very little information.
When to Use Markov
Use Markov when you know only:
X ≥ 0
and:
E[X]
It is useful for first-pass analysis and for algorithms where stronger assumptions are unavailable.
It is usually not the final word. If you know variance or independence structure, stronger bounds are available.
Chebyshev's Inequality
Chebyshev's inequality uses variance.
For a random variable X with finite mean and variance:
P(|X - E[X]| ≥ t) ≤ Var(X) / t²
Example:
E[X] = 100
Var(X) = 25
Question:
P(|X - 100| ≥ 20)
Bound:
25 / 20² = 25 / 400 = 0.0625
So the probability of being at least 20 away from the mean is at most:
6.25%
Chebyshev is stronger than Markov when variance is small.
When to Use Chebyshev
Use Chebyshev when you know:
E[X]
and:
Var(X)
but cannot assume independence or a specific distribution.
This appears in:
- Variance analysis of estimators
- Hashing analyses
- Sampling estimates
- Randomized load measurements
Chebyshev is general, but often conservative.
Chernoff Bounds
Chernoff bounds are much sharper for sums of independent random variables.
Suppose:
X = X₁ + X₂ + ... + Xₙ
where each Xᵢ is an independent 0/1 random variable.
Let:
μ = E[X]
A common upper-tail Chernoff bound is:
P(X ≥ (1 + δ)μ) ≤ exp(-δ²μ / 3)
for:
0 < δ ≤ 1
A common lower-tail bound is:
P(X ≤ (1 - δ)μ) ≤ exp(-δ²μ / 2)
for:
0 < δ < 1
These bounds shrink exponentially with the expectation.
Example: Sampling Error
Suppose we sample users and define:
Xᵢ = 1 if sampled user clicked
Xᵢ = 0 otherwise
Let:
X = total clicks in sample
If:
E[X] = 1000
What is the probability of observing at least 20% more than expected?
Here:
δ = 0.2
Chernoff upper-tail bound:
P(X ≥ 1200) ≤ exp(-(0.2)² · 1000 / 3)
That is:
exp(-13.333...)
≈ 0.00000162
This is far stronger than Markov or Chebyshev for independent Bernoulli trials.
Hoeffding's Inequality
Hoeffding's inequality applies to bounded independent random variables.
If:
X₁, X₂, ..., Xₙ
are independent and each lies in:
[aᵢ, bᵢ]
then it bounds deviation of the sample mean from the true mean.
A common version for variables in [0,1] is:
P(|sample_mean - true_mean| ≥ ε)
≤
2 exp(-2nε²)
This is useful for estimating averages from samples.
Example: Estimating a Click Rate
Suppose we estimate a click rate from:
n = 10,000
samples.
We want error at most:
ε = 0.02
Hoeffding gives:
P(error ≥ 0.02)
≤
2 exp(-2 · 10000 · 0.02²)
Compute:
2 exp(-8)
≈ 0.000671
So the estimate is within 2 percentage points with probability at least approximately:
99.93%
under the independence and boundedness assumptions.
Union Bound
The union bound is simple and frequently useful.
For events:
A₁, A₂, ..., Aₙ
we have:
P(A₁ ∪ A₂ ∪ ... ∪ Aₙ)
≤
P(A₁) + P(A₂) + ... + P(Aₙ)
No independence is required.
Example:
If 100 failure events each have probability at most:
10⁻⁶
then probability that at least one occurs is at most:
100 · 10⁻⁶ = 10⁻⁴
The bound may be loose, but it is robust.
Why the Union Bound Matters
Many algorithms have many possible failure events.
Examples:
- One hash bucket becomes too large.
- One sampled estimate deviates too far.
- One constraint remains unsatisfied.
- One server becomes overloaded.
- One sketch query exceeds its error tolerance.
The union bound lets you control the probability of any bad event occurring.
Example: Randomized Rounding
Suppose randomized rounding has:
1000 constraints
Each constraint fails with probability at most:
10⁻⁶
Then:
P(any constraint fails) ≤ 1000 · 10⁻⁶ = 0.001
So all constraints hold with probability at least:
0.999
This kind of reasoning is standard in randomized approximation algorithms.
Median Trick
Some estimators have a small probability of large error.
A common amplification technique:
- Run the estimator several independent times.
- Return the median result.
This reduces the probability that outliers dominate.
Suppose each estimator is correct with probability at least:
2/3
Run it:
k
times independently.
The median fails only if more than half the runs fail.
Chernoff bounds can show that failure probability decreases exponentially in k.
This is widely used in sketching and randomized numerical algorithms.
Hashing Example
Suppose a randomized hash table has expected chain length:
α
Markov gives a basic bound:
P(chain length ≥ 10α) ≤ 1/10
This may be enough for a rough argument.
If the hashing scheme provides stronger independence, Chernoff-style arguments can prove much sharper bounds on bucket sizes.
The strength of the probability bound depends on the strength of the independence assumption.
Independence Assumptions
Many mistakes in randomized analysis come from assuming independence where it does not exist.
Independent events:
Two separate fair coin flips
Dependent events:
Drawing two cards without replacement
Pairwise independent:
Any two variables are independent,
but larger groups may not be.
Some hash families provide only pairwise independence. That may be enough for expectation and variance, but not enough for Chernoff bounds.
Always check the assumption required by the bound.
Comparison of Bounds
| Bound | Requires | Best For |
|---|---|---|
| Markov | Nonnegative variable and expectation | Very general rough bound |
| Chebyshev | Mean and variance | Deviations with known variance |
| Chernoff | Independent 0/1 variables | Sharp bounds for sums |
| Hoeffding | Independent bounded variables | Sample means |
| Union Bound | Individual event probabilities | Many failure events |
The art is choosing the weakest bound that gives a useful guarantee.
Practical Error Budgets
In engineering, probability bounds become error budgets.
Example:
Sketch query error above tolerance:
≤ 10⁻⁶
Daily number of sketch queries:
10⁵
Union bound:
Daily probability of at least one bad query
≤ 10⁵ · 10⁻⁶ = 0.1
That may be too high.
To reduce it, lower per-query failure probability, reduce query volume, or change the guarantee being measured.
Common Mistakes
Do not use expected value as if it were a high-probability guarantee.
Do not apply Chernoff bounds without independence or the required boundedness assumptions.
Do not ignore the number of failure events when many queries or constraints are involved.
Do not mistake a loose upper bound for the true probability.
Do not mix average-case input assumptions with algorithmic randomness unless the model is explicit.
Takeaway
Probability bounds turn randomized algorithms from intuitive techniques into rigorous tools. Expectation tells you the average. Variance tells you spread. Tail bounds tell you how often bad deviations occur. Markov, Chebyshev, Chernoff, Hoeffding, and the union bound form a practical toolkit for proving that randomized algorithms are not merely fast or accurate on average, but reliable with quantifiable probability.