15.12 Selection Algorithms
You need to find the \(k\)-th smallest element in an unsorted array.
15.12 Selection Algorithms
Problem
You need to find the (k)-th smallest element in an unsorted array.
Examples:
array = [9, 1, 6, 3, 8, 2, 7]
k = 3
The 3rd smallest element is:
3
A simple solution is to sort the array and index into the sorted result.
sort.Ints(nums)
answer := nums[k-1]
This works, but sorting does more work than necessary.
Sorting gives the complete order:
1, 2, 3, 6, 7, 8, 9
Selection only asks for one position.
Can we use divide and conquer to find the answer without fully sorting the input?
Solution
Use a selection algorithm.
The most common divide-and-conquer selection algorithm is Quickselect. It uses the same partitioning idea as quick sort, but recurses into only one side.
After partitioning around a pivot, the pivot lands in its final sorted position. If that position is (k), we are done. If (k) lies to the left, search the left partition. If (k) lies to the right, search the right partition.
Unlike quick sort, both sides are not recursively processed.
Partitioning Review
Suppose the array is:
[9, 1, 6, 3, 8, 2, 7]
Choose pivot:
7
Partition:
[1, 6, 3, 2] 7 [9, 8]
The pivot is now at index 4 in zero-based indexing.
This means:
7 is the 5th smallest element
If we are looking for the 3rd smallest element, there is no reason to inspect the right side:
[9, 8]
The answer must be in:
[1, 6, 3, 2]
This is the central saving.
Quickselect
Quickselect is quick sort with one recursive call removed.
func Quickselect(nums []int, k int) int {
target := k - 1
low, high := 0, len(nums)-1
for low <= high {
pivot := partition(nums, low, high)
if pivot == target {
return nums[pivot]
}
if target < pivot {
high = pivot - 1
} else {
low = pivot + 1
}
}
panic("invalid k")
}
This version expects:
1 <= k <= len(nums)
The helper partition is the same style used in quick sort:
func partition(nums []int, low, high int) int {
pivot := nums[high]
i := low
for j := low; j < high; j++ {
if nums[j] < pivot {
nums[i], nums[j] = nums[j], nums[i]
i++
}
}
nums[i], nums[high] = nums[high], nums[i]
return i
}
The algorithm modifies the input array. If the caller needs the original order, copy the array first.
Average-Case Analysis
If partitioning is reasonably balanced, each step removes a large portion of the array.
The recurrence is approximately:
$$ T(n)=T(n/2)+O(n) $$
The (O(n)) term comes from partitioning.
Expanding:
n + n/2 + n/4 + n/8 + ...
This geometric series sums to:
$$ O(n) $$
Thus Quickselect has expected linear time when pivots are chosen well.
Worst Case
If the pivot is always the smallest or largest element, each partition removes only one item.
The recurrence becomes:
$$ T(n)=T(n-1)+O(n) $$
Expanding:
n + (n-1) + (n-2) + ... + 1
gives:
$$ O(n^2) $$
This is the same worst-case shape as quick sort.
Randomized Quickselect
Randomizing the pivot protects against adversarial input.
func randomizedPartition(nums []int, low, high int) int {
pivotIndex := low + rand.Intn(high-low+1)
nums[pivotIndex], nums[high] = nums[high], nums[pivotIndex]
return partition(nums, low, high)
}
Then replace:
pivot := partition(nums, low, high)
with:
pivot := randomizedPartition(nums, low, high)
Expected running time:
$$ O(n) $$
Randomization does not remove the theoretical worst case, but it makes repeated bad pivots unlikely.
Selecting the Largest Elements
To find the (k)-th largest element, convert the rank.
For an array of length (n):
k-th largest = (n-k+1)-th smallest
In zero-based target indexing:
target := len(nums) - k
Example:
array = [9, 1, 6, 3, 8, 2, 7]
k = 2 largest
Sorted:
1, 2, 3, 6, 7, 8, 9
The 2nd largest is:
8
Its zero-based sorted index is:
len(nums) - 2 = 5
Three-Way Quickselect
Duplicate-heavy arrays can cause unnecessary work.
Example:
[5, 5, 5, 5, 5, 5]
Standard partitioning may repeatedly remove only one pivot.
Three-way partitioning divides the array into:
less than pivot
equal to pivot
greater than pivot
If the target rank lands inside the equal region, the answer is found immediately.
Pseudo-code:
partition into:
nums[low:lt] < pivot
nums[lt:gt+1] == pivot
nums[gt+1:high+1] > pivot
if target < lt
search left
else if target <= gt
return pivot
else
search right
This is often the best version for real input.
Finding the Median
The median is a special case of selection.
For odd (n):
median = element at rank n/2
For even (n), there are two common definitions:
lower median = element at rank n/2 - 1
upper median = element at rank n/2
average median = average of both
Quickselect can compute either median in linear expected time.
For the average median, run selection twice or adapt the algorithm to find both central elements.
Top K Elements
Selection can also find the top (k) elements.
If we partition around the (k)-th smallest element, then every element before that position is among the smallest (k) values, though not necessarily sorted.
Example:
array = [9, 1, 6, 3, 8, 2, 7]
k = 3
After selection, the first three positions might be:
[1, 2, 3]
or:
[2, 1, 3]
They contain the right set, but not necessarily in sorted order.
If sorted top (k) output is required, sort only that prefix:
O(n + k log k)
rather than sorting all (n) elements.
Comparison with Heaps
A heap-based method finds the top (k) values in:
$$ O(n\log k) $$
This is often preferable when:
- (k) is small
- the input is streaming
- the entire array cannot be modified
- incremental results are needed
Quickselect is often preferable when:
- the full array is available
- mutation is allowed
- one rank or unordered top (k) is enough
- expected linear time is desired
| Method | Time | Space | Mutates Input | Best Use |
|---|---|---|---|---|
| Sort | (O(n\log n)) | depends | usually yes | full ordering |
| Heap | (O(n\log k)) | (O(k)) | no | streaming top k |
| Quickselect | expected (O(n)) | (O(1)) | yes | one rank |
| Median of medians | worst-case (O(n)) | (O(1)) or more | yes | guaranteed linear |
Deterministic Linear Selection
Quickselect has expected linear time but quadratic worst-case time.
The median-of-medians algorithm provides worst-case linear time.
Its idea is:
- Split the array into groups of five.
- Find each group median.
- Recursively find the median of those medians.
- Use that value as pivot.
- Partition and recurse into one side.
This pivot is guaranteed to discard a fixed fraction of elements.
Therefore the worst-case recurrence becomes linear.
The algorithm is theoretically important, but in practice it has a larger constant factor than randomized Quickselect.
Common Mistakes
Confusing Rank and Index
Users often say:
k = 1
to mean the smallest element.
Arrays use zero-based indexes.
Convert carefully:
target := k - 1
Sorting Accidentally
If the algorithm sorts the entire array, it is no longer selection.
Selection only needs enough ordering to identify one rank.
Recursing into Both Sides
Quickselect recurses into one side.
If both sides are processed, the algorithm becomes quick sort.
Mishandling Duplicates
Duplicate-heavy input can degrade simple partitioning.
Use three-way partitioning when duplicates are common.
Forgetting Input Mutation
Partitioning reorders the array.
Document this behavior or copy the input.
Discussion
Selection algorithms show how divide and conquer can stop early. Quick sort partitions both sides because sorting requires complete order. Quickselect partitions once, observes where the pivot landed, and discards half of the remaining problem in expectation.
This is a useful mental model: not every recursive decomposition must solve every subproblem. Sometimes a subproblem can be ruled out by rank, geometry, monotonicity, or a boundary condition. Removing one recursive branch changes the recurrence from sorting-like behavior to linear expected time.
Selection appears throughout systems work. Databases use related ideas for order statistics. Analytics systems compute percentiles and quantiles. Search and recommendation systems extract top-k candidates. Even when production implementations use heaps or approximate sketches, Quickselect remains the basic tool for understanding how much work exact selection really requires.