15.12 Selection Algorithms

You need to find the \(k\)-th smallest element in an unsorted array.

15.12 Selection Algorithms

Problem

You need to find the (k)-th smallest element in an unsorted array.

Examples:

array = [9, 1, 6, 3, 8, 2, 7]
k = 3

The 3rd smallest element is:

3

A simple solution is to sort the array and index into the sorted result.

sort.Ints(nums)
answer := nums[k-1]

This works, but sorting does more work than necessary.

Sorting gives the complete order:

1, 2, 3, 6, 7, 8, 9

Selection only asks for one position.

Can we use divide and conquer to find the answer without fully sorting the input?

Solution

Use a selection algorithm.

The most common divide-and-conquer selection algorithm is Quickselect. It uses the same partitioning idea as quick sort, but recurses into only one side.

After partitioning around a pivot, the pivot lands in its final sorted position. If that position is (k), we are done. If (k) lies to the left, search the left partition. If (k) lies to the right, search the right partition.

Unlike quick sort, both sides are not recursively processed.

Partitioning Review

Suppose the array is:

[9, 1, 6, 3, 8, 2, 7]

Choose pivot:

7

Partition:

[1, 6, 3, 2] 7 [9, 8]

The pivot is now at index 4 in zero-based indexing.

This means:

7 is the 5th smallest element

If we are looking for the 3rd smallest element, there is no reason to inspect the right side:

[9, 8]

The answer must be in:

[1, 6, 3, 2]

This is the central saving.

Quickselect

Quickselect is quick sort with one recursive call removed.

func Quickselect(nums []int, k int) int {
    target := k - 1
    low, high := 0, len(nums)-1

    for low <= high {
        pivot := partition(nums, low, high)

        if pivot == target {
            return nums[pivot]
        }

        if target < pivot {
            high = pivot - 1
        } else {
            low = pivot + 1
        }
    }

    panic("invalid k")
}

This version expects:

1 <= k <= len(nums)

The helper partition is the same style used in quick sort:

func partition(nums []int, low, high int) int {
    pivot := nums[high]
    i := low

    for j := low; j < high; j++ {
        if nums[j] < pivot {
            nums[i], nums[j] = nums[j], nums[i]
            i++
        }
    }

    nums[i], nums[high] = nums[high], nums[i]
    return i
}

The algorithm modifies the input array. If the caller needs the original order, copy the array first.

Average-Case Analysis

If partitioning is reasonably balanced, each step removes a large portion of the array.

The recurrence is approximately:

$$ T(n)=T(n/2)+O(n) $$

The (O(n)) term comes from partitioning.

Expanding:

n + n/2 + n/4 + n/8 + ...

This geometric series sums to:

$$ O(n) $$

Thus Quickselect has expected linear time when pivots are chosen well.

Worst Case

If the pivot is always the smallest or largest element, each partition removes only one item.

The recurrence becomes:

$$ T(n)=T(n-1)+O(n) $$

Expanding:

n + (n-1) + (n-2) + ... + 1

gives:

$$ O(n^2) $$

This is the same worst-case shape as quick sort.

Randomized Quickselect

Randomizing the pivot protects against adversarial input.

func randomizedPartition(nums []int, low, high int) int {
    pivotIndex := low + rand.Intn(high-low+1)
    nums[pivotIndex], nums[high] = nums[high], nums[pivotIndex]
    return partition(nums, low, high)
}

Then replace:

pivot := partition(nums, low, high)

with:

pivot := randomizedPartition(nums, low, high)

Expected running time:

$$ O(n) $$

Randomization does not remove the theoretical worst case, but it makes repeated bad pivots unlikely.

Selecting the Largest Elements

To find the (k)-th largest element, convert the rank.

For an array of length (n):

k-th largest = (n-k+1)-th smallest

In zero-based target indexing:

target := len(nums) - k

Example:

array = [9, 1, 6, 3, 8, 2, 7]
k = 2 largest

Sorted:

1, 2, 3, 6, 7, 8, 9

The 2nd largest is:

8

Its zero-based sorted index is:

len(nums) - 2 = 5

Three-Way Quickselect

Duplicate-heavy arrays can cause unnecessary work.

Example:

[5, 5, 5, 5, 5, 5]

Standard partitioning may repeatedly remove only one pivot.

Three-way partitioning divides the array into:

less than pivot
equal to pivot
greater than pivot

If the target rank lands inside the equal region, the answer is found immediately.

Pseudo-code:

partition into:
    nums[low:lt]      < pivot
    nums[lt:gt+1]     == pivot
    nums[gt+1:high+1] > pivot

if target < lt
    search left
else if target <= gt
    return pivot
else
    search right

This is often the best version for real input.

Finding the Median

The median is a special case of selection.

For odd (n):

median = element at rank n/2

For even (n), there are two common definitions:

lower median = element at rank n/2 - 1
upper median = element at rank n/2
average median = average of both

Quickselect can compute either median in linear expected time.

For the average median, run selection twice or adapt the algorithm to find both central elements.

Top K Elements

Selection can also find the top (k) elements.

If we partition around the (k)-th smallest element, then every element before that position is among the smallest (k) values, though not necessarily sorted.

Example:

array = [9, 1, 6, 3, 8, 2, 7]
k = 3

After selection, the first three positions might be:

[1, 2, 3]

or:

[2, 1, 3]

They contain the right set, but not necessarily in sorted order.

If sorted top (k) output is required, sort only that prefix:

O(n + k log k)

rather than sorting all (n) elements.

Comparison with Heaps

A heap-based method finds the top (k) values in:

$$ O(n\log k) $$

This is often preferable when:

  • (k) is small
  • the input is streaming
  • the entire array cannot be modified
  • incremental results are needed

Quickselect is often preferable when:

  • the full array is available
  • mutation is allowed
  • one rank or unordered top (k) is enough
  • expected linear time is desired
Method Time Space Mutates Input Best Use
Sort (O(n\log n)) depends usually yes full ordering
Heap (O(n\log k)) (O(k)) no streaming top k
Quickselect expected (O(n)) (O(1)) yes one rank
Median of medians worst-case (O(n)) (O(1)) or more yes guaranteed linear

Deterministic Linear Selection

Quickselect has expected linear time but quadratic worst-case time.

The median-of-medians algorithm provides worst-case linear time.

Its idea is:

  1. Split the array into groups of five.
  2. Find each group median.
  3. Recursively find the median of those medians.
  4. Use that value as pivot.
  5. Partition and recurse into one side.

This pivot is guaranteed to discard a fixed fraction of elements.

Therefore the worst-case recurrence becomes linear.

The algorithm is theoretically important, but in practice it has a larger constant factor than randomized Quickselect.

Common Mistakes

Confusing Rank and Index

Users often say:

k = 1

to mean the smallest element.

Arrays use zero-based indexes.

Convert carefully:

target := k - 1

Sorting Accidentally

If the algorithm sorts the entire array, it is no longer selection.

Selection only needs enough ordering to identify one rank.

Recursing into Both Sides

Quickselect recurses into one side.

If both sides are processed, the algorithm becomes quick sort.

Mishandling Duplicates

Duplicate-heavy input can degrade simple partitioning.

Use three-way partitioning when duplicates are common.

Forgetting Input Mutation

Partitioning reorders the array.

Document this behavior or copy the input.

Discussion

Selection algorithms show how divide and conquer can stop early. Quick sort partitions both sides because sorting requires complete order. Quickselect partitions once, observes where the pivot landed, and discards half of the remaining problem in expectation.

This is a useful mental model: not every recursive decomposition must solve every subproblem. Sometimes a subproblem can be ruled out by rank, geometry, monotonicity, or a boundary condition. Removing one recursive branch changes the recurrence from sorting-like behavior to linear expected time.

Selection appears throughout systems work. Databases use related ideas for order statistics. Analytics systems compute percentiles and quantiles. Search and recommendation systems extract top-k candidates. Even when production implementations use heaps or approximate sketches, Quickselect remains the basic tool for understanding how much work exact selection really requires.