19.4 Randomized Quicksort

Randomized quicksort is one of the most successful applications of randomization in algorithm design. It combines the elegance and practical efficiency of quicksort with probabilistic guarantees that eliminate dependence on particular input arrangements. The resulting algorithm is simple, fast, cache-friendly, and widely used in production systems.

Classical quicksort already performs extremely well on average. Its weakness lies in pivot selection. Poor pivot choices produce highly unbalanced partitions, causing the recursion tree to degenerate into a chain. Randomized quicksort solves this problem by choosing pivots uniformly at random. While bad pivots remain possible, long sequences of bad pivots become increasingly unlikely.

The algorithm remains a Las Vegas algorithm. Every execution produces a correctly sorted array. Randomness affects the amount of work performed, not correctness.

Problem

Given an array:

A = [38, 12, 55, 7, 19, 42, 3, 27]

sort the elements into ascending order.

Desired output:

[3, 7, 12, 19, 27, 38, 42, 55]

The challenge is achieving excellent average performance while avoiding deterministic worst-case inputs.

Classical Quicksort Review

Quicksort follows three steps.

Step 1

Choose a pivot.

Step 2

Partition the array into:

Less than pivot
Pivot
Greater than pivot

Step 3

Recursively sort both partitions.

Example:

[8, 4, 2, 9, 5, 7]

Pivot:

Partition:

[4, 2]
[5]
[8, 9, 7]

Recursively sort both sides.

The Pivot Problem

Suppose the first element is always selected.

Input:

[1,2,3,4,5,6,7,8]

Partitions become:

[]
[1]
[2,3,4,5,6,7,8]

Then:

[]
[2]
[3,4,5,6,7,8]

Then:

[]
[3]
[4,5,6,7,8]

and so on.

The recursion tree degenerates into:

n levels

Total work:

O(n²)

This behavior occurs because every pivot is the smallest remaining element.

Randomized Pivot Selection

Randomized quicksort chooses:

pivot = random element

rather than:

pivot = first element

The algorithm becomes:

def randomized_quicksort(A):
    if len(A) <= 1:
        return A

    pivot = random.choice(A)

    left = []
    middle = []
    right = []

    for x in A:
        if x < pivot:
            left.append(x)
        elif x > pivot:
            right.append(x)
        else:
            middle.append(x)

    return (
        randomized_quicksort(left)
        + middle
        + randomized_quicksort(right)
    )

Correctness is unchanged.

Only pivot selection differs.

Example Execution

Input:

[8,4,2,9,5,7]

Suppose random pivot:

Partition:

[4,2,5]
[7]
[8,9]

Recursive calls:

Sort [4,2,5]
Sort [8,9]

Suppose next pivot:

Partition:

[2]
[4]
[5]

Sorting finishes quickly.

A different execution may choose different pivots and build a different recursion tree.

Every execution still produces:

[2,4,5,7,8,9]

Why Randomization Works

Randomization destroys the relationship between:

Input arrangement

and

Pivot selection

In deterministic quicksort:

Bad input
→
Bad pivots
→
Bad runtime

In randomized quicksort:

Bad input
→
Random pivots

The input can no longer force worst-case behavior.

Only unlucky random choices can produce poor partitions.

Expected Running Time

The most important result is:

Expected runtime = O(n log n)

This expectation holds for:

Sorted input
Reverse-sorted input
Random input
Duplicate-heavy input
Adversarial input

The bound depends only on random pivot choices.

This property makes randomized quicksort highly robust.

Intuition Behind the Analysis

Consider a pivot.

A good pivot divides the array roughly in half.

Example:

40% left
60% right

45% left
55% right

Balanced partitions produce recursion depth:

O(log n)

Random selection frequently produces reasonably balanced pivots.

Extremely bad pivots are rare.

The recursion tree therefore remains shallow on average.

Pairwise Comparison Analysis

A classic proof analyzes comparisons directly.

Consider two elements:

Ai
Aj

These elements are compared only if one becomes the first pivot selected among all elements lying between them in sorted order.

The probability of this event is:

2 / (j - i + 1)

Summing across all pairs yields:

O(n log n)

expected comparisons.

This elegant argument avoids analyzing the recursion tree directly.

Expected Number of Comparisons

For:

n elements

expected comparisons satisfy:

E[C(n)]
=
O(n log n)

This result explains the practical success of randomized quicksort.

The algorithm performs close to the theoretical optimum for comparison-based sorting.

Worst Case Still Exists

Randomization reduces probability.

It does not eliminate possibility.

Suppose the smallest element is chosen repeatedly:

1
2
3
4
...

Partitions become maximally unbalanced.

Runtime:

O(n²)

This remains possible.

The probability becomes extremely small.

For large inputs the chance of consistently poor pivots is negligible.

Recursion Depth

Balanced partitions:

Depth ≈ log₂ n

Worst-case partitions:

Depth ≈ n

Randomization causes recursion depth to remain logarithmic with high probability.

This reduces stack usage and improves performance.

Three-Way Partitioning

Arrays often contain duplicates.

Example:

[5,5,5,5,5,5,5]

Standard partitioning performs unnecessary work.

Three-way partitioning separates:

Less than pivot
Equal to pivot
Greater than pivot

Example:

[4,5,2,5,8,5,1]

becomes:

[4,2,1]
[5,5,5]
[8]

This dramatically improves performance when duplicates are common.

Modern implementations frequently use this technique.

In-Place Randomized Quicksort

Production systems typically avoid auxiliary arrays.

Instead they:

Choose a random pivot.
Swap it into position.
Partition in place.
Recurse on subarrays.

Memory usage becomes:

O(log n)

from recursion alone.

No additional storage proportional to:

is required.

Comparison with Merge Sort

Property	Randomized Quicksort	Merge Sort
Expected runtime	O(n log n)	O(n log n)
Worst-case runtime	O(n²)	O(n log n)
Extra memory	O(log n)	O(n)
Cache efficiency	Excellent	Good
Stability	Usually unstable	Stable
Randomization	Required	No

This explains why quicksort remains extremely popular despite its theoretical worst case.

Practical Optimizations

Real implementations often include:

Small Subarray Cutoff

Switch to insertion sort for:

n < 16

or similar thresholds.

Tail Recursion Elimination

Recurse only on the smaller partition.

Convert the larger partition into iteration.

This guarantees:

O(log n)

stack usage.

Median-of-Three Randomization

Select several random candidates.

Choose their median.

This increases partition quality.

Duplicate Handling

Use three-way partitioning.

Significant improvements occur when repeated values are common.

Correctness Proof

The proof follows induction.

Base Case

Arrays of size:

0 or 1

are already sorted.

Inductive Hypothesis

Assume smaller arrays are sorted correctly.

Inductive Step

Partitioning guarantees:

Left < Pivot < Right

Recursive calls sort:

Left

and

Right

By induction, both become sorted.

Concatenating:

Sorted Left
Pivot
Sorted Right

produces a sorted array.

Therefore every execution is correct.

Randomness never affects correctness.

Common Mistakes

Deterministic Random Seed

Using the same predictable seed repeatedly may expose worst-case behavior.

Poor Random Generator

Low-quality generators can bias pivot selection.

Ignoring Duplicates

Repeated values can significantly affect performance.

Use three-way partitioning.

Analyzing Worst Case Only

The primary guarantee is:

Expected O(n log n)

not:

Worst-case O(n log n)

The distinction matters.

Discussion

Randomized quicksort demonstrates the central philosophy of randomized algorithm design: replace deterministic guarantees that are vulnerable to adversarial inputs with probabilistic guarantees that hold for every input. A single random choice at each recursive step transforms a sorting algorithm with fragile performance characteristics into one of the most effective general-purpose sorting methods ever developed.

The analysis introduces many of the techniques used throughout randomized algorithms, including expectation, probability distributions, recursion trees, and pairwise event analysis. These ideas reappear in skip lists, treaps, randomized geometry algorithms, probabilistic data structures, and streaming systems. The next section examines random sampling, another fundamental technique that uses a small amount of randomness to extract useful information from massive data sets.