Heap Select finds an order statistic by using a heap to keep only the part of the input that can still contain the answer.
For the k-th smallest element, it is common to keep a max heap of size . The heap stores the smallest values seen so far. Whenever a new value is smaller than the heap maximum, the maximum is removed and the new value is inserted. After the scan finishes, the heap maximum is the k-th smallest value.
Problem
Given an array of length and an integer with , return the k-th smallest element.
Algorithm
Maintain a max heap of size at most .
heap_select(A, k):
H = empty max heap
for x in A:
push x into H
if size(H) > k + 1:
pop maximum from H
return maximum(H)This keeps the smallest elements seen so far. The largest among them has rank .
Example
Let
and
The three smallest values are:
The largest of these is , so the answer is:
Correctness
After processing each prefix of the array, the heap contains the smallest elements from that prefix, or all elements if fewer than have been seen. This invariant holds because every time the heap grows too large, the largest candidate is removed.
At the end, the heap contains the smallest elements of the whole array. The maximum of these elements is exactly the k-th smallest element.
Complexity
| operation | cost |
|---|---|
| heap size | |
| each insertion | |
| total time | |
| space |
For selecting the k-th largest, use a min heap of size instead.
When to Use
Use Heap Select when:
- is much smaller than
- the input is streamed
- you need top-k values, not just one element
- predictable memory usage matters
Quickselect is usually faster for in-memory selection when only one rank is needed.
Implementation
Python has a min heap, so this implementation stores negative values to simulate a max heap.
import heapq
def heap_select(a, k):
heap = []
for x in a:
heapq.heappush(heap, -x)
if len(heap) > k + 1:
heapq.heappop(heap)
return -heap[0]package main
import "container/heap"
type MaxHeap []int
func (h MaxHeap) Len() int {
return len(h)
}
func (h MaxHeap) Less(i, j int) bool {
return h[i] > h[j]
}
func (h MaxHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}
func (h *MaxHeap) Push(x any) {
*h = append(*h, x.(int))
}
func (h *MaxHeap) Pop() any {
old := *h
n := len(old)
x := old[n-1]
*h = old[:n-1]
return x
}
func HeapSelect(a []int, k int) int {
h := &MaxHeap{}
heap.Init(h)
for _, x := range a {
heap.Push(h, x)
if h.Len() > k+1 {
heap.Pop(h)
}
}
return (*h)[0]
}