19.4 Randomized Quicksort
Randomized quicksort is one of the most successful applications of randomization in algorithm design.
19.4 Randomized Quicksort
Randomized quicksort is one of the most successful applications of randomization in algorithm design. It combines the elegance and practical efficiency of quicksort with probabilistic guarantees that eliminate dependence on particular input arrangements. The resulting algorithm is simple, fast, cache-friendly, and widely used in production systems.
Classical quicksort already performs extremely well on average. Its weakness lies in pivot selection. Poor pivot choices produce highly unbalanced partitions, causing the recursion tree to degenerate into a chain. Randomized quicksort solves this problem by choosing pivots uniformly at random. While bad pivots remain possible, long sequences of bad pivots become increasingly unlikely.
The algorithm remains a Las Vegas algorithm. Every execution produces a correctly sorted array. Randomness affects the amount of work performed, not correctness.
Problem
Given an array:
A = [38, 12, 55, 7, 19, 42, 3, 27]
sort the elements into ascending order.
Desired output:
[3, 7, 12, 19, 27, 38, 42, 55]
The challenge is achieving excellent average performance while avoiding deterministic worst-case inputs.
Classical Quicksort Review
Quicksort follows three steps.
Step 1
Choose a pivot.
Step 2
Partition the array into:
Less than pivot
Pivot
Greater than pivot
Step 3
Recursively sort both partitions.
Example:
[8, 4, 2, 9, 5, 7]
Pivot:
5
Partition:
[4, 2]
[5]
[8, 9, 7]
Recursively sort both sides.
The Pivot Problem
Suppose the first element is always selected.
Input:
[1,2,3,4,5,6,7,8]
Partitions become:
[]
[1]
[2,3,4,5,6,7,8]
Then:
[]
[2]
[3,4,5,6,7,8]
Then:
[]
[3]
[4,5,6,7,8]
and so on.
The recursion tree degenerates into:
n levels
Total work:
O(n²)
This behavior occurs because every pivot is the smallest remaining element.
Randomized Pivot Selection
Randomized quicksort chooses:
pivot = random element
rather than:
pivot = first element
The algorithm becomes:
def randomized_quicksort(A):
if len(A) <= 1:
return A
pivot = random.choice(A)
left = []
middle = []
right = []
for x in A:
if x < pivot:
left.append(x)
elif x > pivot:
right.append(x)
else:
middle.append(x)
return (
randomized_quicksort(left)
+ middle
+ randomized_quicksort(right)
)
Correctness is unchanged.
Only pivot selection differs.
Example Execution
Input:
[8,4,2,9,5,7]
Suppose random pivot:
7
Partition:
[4,2,5]
[7]
[8,9]
Recursive calls:
Sort [4,2,5]
Sort [8,9]
Suppose next pivot:
4
Partition:
[2]
[4]
[5]
Sorting finishes quickly.
A different execution may choose different pivots and build a different recursion tree.
Every execution still produces:
[2,4,5,7,8,9]
Why Randomization Works
Randomization destroys the relationship between:
Input arrangement
and
Pivot selection
In deterministic quicksort:
Bad input
→
Bad pivots
→
Bad runtime
In randomized quicksort:
Bad input
→
Random pivots
The input can no longer force worst-case behavior.
Only unlucky random choices can produce poor partitions.
Expected Running Time
The most important result is:
Expected runtime = O(n log n)
This expectation holds for:
- Sorted input
- Reverse-sorted input
- Random input
- Duplicate-heavy input
- Adversarial input
The bound depends only on random pivot choices.
This property makes randomized quicksort highly robust.
Intuition Behind the Analysis
Consider a pivot.
A good pivot divides the array roughly in half.
Example:
40% left
60% right
or
45% left
55% right
Balanced partitions produce recursion depth:
O(log n)
Random selection frequently produces reasonably balanced pivots.
Extremely bad pivots are rare.
The recursion tree therefore remains shallow on average.
Pairwise Comparison Analysis
A classic proof analyzes comparisons directly.
Consider two elements:
Ai
Aj
These elements are compared only if one becomes the first pivot selected among all elements lying between them in sorted order.
The probability of this event is:
2 / (j - i + 1)
Summing across all pairs yields:
O(n log n)
expected comparisons.
This elegant argument avoids analyzing the recursion tree directly.
Expected Number of Comparisons
For:
n elements
expected comparisons satisfy:
E[C(n)]
=
O(n log n)
This result explains the practical success of randomized quicksort.
The algorithm performs close to the theoretical optimum for comparison-based sorting.
Worst Case Still Exists
Randomization reduces probability.
It does not eliminate possibility.
Suppose the smallest element is chosen repeatedly:
1
2
3
4
...
Partitions become maximally unbalanced.
Runtime:
O(n²)
This remains possible.
The probability becomes extremely small.
For large inputs the chance of consistently poor pivots is negligible.
Recursion Depth
Balanced partitions:
Depth ≈ log₂ n
Worst-case partitions:
Depth ≈ n
Randomization causes recursion depth to remain logarithmic with high probability.
This reduces stack usage and improves performance.
Three-Way Partitioning
Arrays often contain duplicates.
Example:
[5,5,5,5,5,5,5]
Standard partitioning performs unnecessary work.
Three-way partitioning separates:
Less than pivot
Equal to pivot
Greater than pivot
Example:
[4,5,2,5,8,5,1]
becomes:
[4,2,1]
[5,5,5]
[8]
This dramatically improves performance when duplicates are common.
Modern implementations frequently use this technique.
In-Place Randomized Quicksort
Production systems typically avoid auxiliary arrays.
Instead they:
- Choose a random pivot.
- Swap it into position.
- Partition in place.
- Recurse on subarrays.
Memory usage becomes:
O(log n)
from recursion alone.
No additional storage proportional to:
n
is required.
Comparison with Merge Sort
| Property | Randomized Quicksort | Merge Sort |
|---|---|---|
| Expected runtime | O(n log n) | O(n log n) |
| Worst-case runtime | O(n²) | O(n log n) |
| Extra memory | O(log n) | O(n) |
| Cache efficiency | Excellent | Good |
| Stability | Usually unstable | Stable |
| Randomization | Required | No |
This explains why quicksort remains extremely popular despite its theoretical worst case.
Practical Optimizations
Real implementations often include:
Small Subarray Cutoff
Switch to insertion sort for:
n < 16
or similar thresholds.
Tail Recursion Elimination
Recurse only on the smaller partition.
Convert the larger partition into iteration.
This guarantees:
O(log n)
stack usage.
Median-of-Three Randomization
Select several random candidates.
Choose their median.
This increases partition quality.
Duplicate Handling
Use three-way partitioning.
Significant improvements occur when repeated values are common.
Correctness Proof
The proof follows induction.
Base Case
Arrays of size:
0 or 1
are already sorted.
Inductive Hypothesis
Assume smaller arrays are sorted correctly.
Inductive Step
Partitioning guarantees:
Left < Pivot < Right
Recursive calls sort:
Left
and
Right
By induction, both become sorted.
Concatenating:
Sorted Left
Pivot
Sorted Right
produces a sorted array.
Therefore every execution is correct.
Randomness never affects correctness.
Common Mistakes
Deterministic Random Seed
Using the same predictable seed repeatedly may expose worst-case behavior.
Poor Random Generator
Low-quality generators can bias pivot selection.
Ignoring Duplicates
Repeated values can significantly affect performance.
Use three-way partitioning.
Analyzing Worst Case Only
The primary guarantee is:
Expected O(n log n)
not:
Worst-case O(n log n)
The distinction matters.
Discussion
Randomized quicksort demonstrates the central philosophy of randomized algorithm design: replace deterministic guarantees that are vulnerable to adversarial inputs with probabilistic guarantees that hold for every input. A single random choice at each recursive step transforms a sorting algorithm with fragile performance characteristics into one of the most effective general-purpose sorting methods ever developed.
The analysis introduces many of the techniques used throughout randomized algorithms, including expectation, probability distributions, recursion trees, and pairwise event analysis. These ideas reappear in skip lists, treaps, randomized geometry algorithms, probabilistic data structures, and streaming systems. The next section examines random sampling, another fundamental technique that uses a small amount of randomness to extract useful information from massive data sets.