Find the middle value of a collection in linear time using selection algorithms, without the overhead of a full sort.
Median selection finds the middle value of a collection without fully sorting it. A full sort gives the median, but it also computes much more order information than the median requires. Selection algorithms avoid that extra work.
Problem
You have an array A[0..n-1]. You want the median value, or one of the median values, using less work than sorting the whole array.
Median Definitions
For an odd number of elements, the median is the middle element in sorted order.
n = 2m + 1
median index = mFor an even number of elements, there are three common conventions.
lower median index = n / 2 - 1
upper median index = n / 2
average median = average of lower and upper mediansFor example:
A = [7, 1, 5, 9, 2, 8]
sorted(A) = [1, 2, 5, 7, 8, 9]The lower median is 5, the upper median is 7, and the average median is 6.
The correct convention depends on the application. In integer algorithms, the lower or upper median often preserves the input type. In statistics, the average median is common for numeric data.
Solution
Use quickselect to find the needed rank.
median(A):
n = length(A)
if n is odd:
k = floor(n / 2)
return quickselect(A, 0, n, k)
else:
k1 = n / 2 - 1
k2 = n / 2
x = quickselect(A, 0, n, k1)
y = quickselect(A, 0, n, k2)
return average(x, y)This simple version may run quickselect twice for even n. A more careful implementation can select one median, then find the neighboring median in the appropriate partition. The simple form is easier to verify.
Example
Let:
A = [7, 1, 5, 9, 2, 8, 3]Here n = 7, so the median index is:
floor(7 / 2) = 3The sorted array would be:
[1, 2, 3, 5, 7, 8, 9]The median is:
5Quickselect finds the element of rank 3 without fully sorting the array.
Selection Invariant
Median selection is a rank selection problem. The invariant is the same as quickselect:
The desired rank lies inside the active range.At each partition step, the pivot lands at its final sorted rank p.
If p is the median rank, the algorithm returns the pivot. If p is smaller than the median rank, the median must lie to the right. If p is larger, the median must lie to the left.
This logic discards the part of the array that cannot contain the median.
Correctness
For odd n, the median is exactly the element at rank floor(n / 2). Since quickselect returns the element at the requested rank, it returns the median.
For even n, the lower and upper median ranks are adjacent. Selecting both gives the two central values of the sorted array. Averaging them gives the average median when the data type supports averaging.
The permutation condition matters only inside the selection procedure. Quickselect rearranges elements but preserves the multiset of input values, so rank meanings remain valid.
Complexity
With randomized pivot selection, quickselect has expected linear time:
O(n)Using it once gives expected O(n) median selection.
Using it twice for the average median still gives expected linear time:
O(n) + O(n) = O(n)The worst case for naive pivot selection is:
O(n^2)This occurs when partitioning repeatedly chooses the smallest or largest remaining element as pivot.
Deterministic Linear-Time Median
There is a deterministic worst-case linear-time method known as median of medians. It chooses a pivot that is guaranteed to discard a constant fraction of the array.
The method is:
select(A, k):
divide A into groups of 5
sort each group
collect the median of each group
pivot = select(medians, middle rank)
partition A around pivot
recurse into the side containing rank kThe important property is that the pivot is good enough. It may not split the array exactly in half, but it prevents extremely unbalanced partitions.
The result is worst-case:
O(n)In practice, median of medians has larger constants than randomized quickselect. It is most useful when worst-case guarantees matter more than ordinary speed.
Median of Medians Argument
Groups of five are used because they give a simple constant-fraction discard bound.
After sorting each group of five, the group median has two elements less than or equal to it and two elements greater than or equal to it. When the pivot is chosen as the median of these group medians, at least half of the group medians are less than or equal to the pivot, and at least half are greater than or equal to it.
Therefore, a constant fraction of all elements can be proved to be on each side of the pivot. This gives a recurrence of the form:
T(n) <= T(n/5) + T(7n/10) + O(n)which solves to:
O(n)The exact constants are less important than the structure: one recursive call chooses the pivot, one recursive call continues selection, and the partition step is linear.
Stability
Median selection is not stable because stability is a property of a full ordering, while selection only returns a value or partitions an array.
If records have equal keys, median selection should specify whether it returns:
any record with the median key
the first such record by input order
the last such record by input order
all records tied at the median keyThe usual quickselect implementation gives an arbitrary representative among equal keys unless additional tie-breaking is added.
Implementation Notes
Do not sort the whole array unless the sorted order is also needed. Sorting costs:
O(n log n)while median selection can run in expected:
O(n)For small arrays, full sorting may still be simpler and fast enough. Many practical implementations use sorting below a small threshold and quickselect above it.
For floating-point data, define how to handle NaN, infinities, and signed zero. Median selection depends on a consistent ordering relation.
For integer average medians, avoid overflow when computing:
(x + y) / 2Use a safer form when needed:
x + (y - x) / 2or promote to a wider type.
Common Bugs
A common bug is using n / 2 for all even-size median definitions without stating whether this means lower median, upper median, or average median.
Another bug is computing the average median with integer division when a fractional result is expected.
A third bug is assuming quickselect leaves the array sorted. It only places the requested rank correctly and partitions around it.
A fourth bug is ignoring duplicate median values. When many elements equal the median, there may be several valid records with the same key.
Takeaway
Median selection is rank selection specialized to the middle rank. Quickselect gives a simple expected-linear solution. Median of medians gives a deterministic worst-case-linear solution when stronger guarantees are required.