1.16 Randomization

Randomized algorithms use random choices during execution. The output or running time may depend on those choices. Randomization is useful when it simplifies design, avoids adversarial patterns, or gives good expected performance with a small amount of code.

A randomized algorithm still needs a correctness contract. The contract must say whether the algorithm is always correct, probably correct, or correct in expectation.

Problem

You want to use randomness to make an algorithm simpler or faster, but you need to describe what guarantee the algorithm provides.

Solution

Classify the algorithm by its guarantee.

Las Vegas algorithm:
  Always returns a correct answer.
  Running time is random.

Monte Carlo algorithm:
  Running time is bounded or controlled.
  Answer may be wrong with small probability.

Then state the probability model clearly.

Random choices:
  How random values are generated.

Success probability:
  Probability that the algorithm returns a correct answer.

Expected time:
  Average running time over the random choices.

Example: Randomized Quicksort

Deterministic quicksort can degrade to quadratic time when pivot choices are consistently poor.

Randomized quicksort chooses a random pivot.

randomized_quicksort(A):
  if length(A) <= 1:
    return A

  pivot = random element of A

  L = elements less than pivot
  E = elements equal to pivot
  R = elements greater than pivot

  return randomized_quicksort(L) + E + randomized_quicksort(R)

The algorithm always returns a sorted array. The randomness affects the partition balance and therefore the running time.

Expected time:

O(n log n)

Worst-case time:

O(n^2)

This is a Las Vegas algorithm: correctness is certain, but running time is random.

Example: Random Sampling

Suppose you want to estimate the average value of a very large array.

estimate_mean(A, k):
  total = 0

  repeat k times:
    i = random index from 0 to n - 1
    total = total + A[i]

  return total / k

This does not always return the exact mean. It returns an estimate whose accuracy depends on the number of samples and the distribution of values.

This is a Monte Carlo algorithm. Its running time is fixed at O(k), but the answer has sampling error.

Randomization as Symmetry Breaking

Randomness is often useful when deterministic choices may be exploited.

Examples:

Problem	Randomized Choice
Quicksort	Random pivot
Hash table	Randomized hash seed
Load balancing	Random server choice
Sampling	Random subset of data
Graph algorithms	Random contraction or walk

The purpose is not to be mysterious. Randomness prevents the algorithm from always making the same bad choice on structured input.

Expected Running Time

Expected time averages over the algorithm’s random choices, not necessarily over random input.

For randomized quicksort, the input can be fixed. The expectation is over the random pivot sequence.

Write this explicitly:

For every fixed input of length n,
the expected running time over the algorithm's random choices is O(n log n).

This is stronger than saying the algorithm is fast only on average input.

Error Probability

For Monte Carlo algorithms, the error probability must be stated.

Example:

The algorithm returns a correct answer with probability at least 1 - δ.

Here δ is the failure probability.

Many randomized algorithms can reduce error by repetition.

repeat the algorithm r times
combine the results by majority vote or best candidate

If each independent run fails with probability at most p, repeated independent trials can make the total failure probability much smaller.

Random Seeds

In implementation, randomness usually comes from a pseudo-random generator. A seed determines the sequence.

For testing, fixed seeds are useful:

seed = 12345

This makes failures reproducible. For production, randomized seeds may reduce exposure to adversarial inputs.

Common Pitfalls

Do not say “randomized” without saying what is random.

Do not confuse expected time with worst-case time.

Do not use Monte Carlo algorithms where exact correctness is required unless the failure probability is acceptable and stated.

Do not rely on weak randomness for adversarial settings such as hashing exposed to untrusted input.

Do not make random tests non-reproducible. Record the seed when a test fails.

Takeaway

Randomization is a tool for controlling uncertainty. In Las Vegas algorithms, the answer is always correct and the running time is random. In Monte Carlo algorithms, the running time is controlled and the answer may have bounded error. State the random choices, the guarantee, and the probability model explicitly.