Skip to content

Heap Select

Select the k-th smallest or largest element by maintaining a heap of candidate values.

Heap Select finds an order statistic by using a heap to keep only the part of the input that can still contain the answer.

For the k-th smallest element, it is common to keep a max heap of size k+1k + 1. The heap stores the smallest values seen so far. Whenever a new value is smaller than the heap maximum, the maximum is removed and the new value is inserted. After the scan finishes, the heap maximum is the k-th smallest value.

Problem

Given an array AA of length nn and an integer kk with 0k<n0 \le k < n, return the k-th smallest element.

Algorithm

Maintain a max heap of size at most k+1k + 1.

heap_select(A, k):
    H = empty max heap

    for x in A:
        push x into H

        if size(H) > k + 1:
            pop maximum from H

    return maximum(H)

This keeps the smallest k+1k + 1 elements seen so far. The largest among them has rank kk.

Example

Let

A=[7,2,9,4,3] A = [7, 2, 9, 4, 3]

and

k=2 k = 2

The three smallest values are:

[2,3,4] [2, 3, 4]

The largest of these is 44, so the answer is:

4 4

Correctness

After processing each prefix of the array, the heap contains the smallest k+1k + 1 elements from that prefix, or all elements if fewer than k+1k + 1 have been seen. This invariant holds because every time the heap grows too large, the largest candidate is removed.

At the end, the heap contains the smallest k+1k + 1 elements of the whole array. The maximum of these elements is exactly the k-th smallest element.

Complexity

operationcost
heap sizek+1k + 1
each insertionO(logk)O(\log k)
total timeO(nlogk)O(n \log k)
spaceO(k)O(k)

For selecting the k-th largest, use a min heap of size k+1k + 1 instead.

When to Use

Use Heap Select when:

  • kk is much smaller than nn
  • the input is streamed
  • you need top-k values, not just one element
  • predictable memory usage matters

Quickselect is usually faster for in-memory selection when only one rank is needed.

Implementation

Python has a min heap, so this implementation stores negative values to simulate a max heap.

import heapq

def heap_select(a, k):
    heap = []

    for x in a:
        heapq.heappush(heap, -x)

        if len(heap) > k + 1:
            heapq.heappop(heap)

    return -heap[0]
package main

import "container/heap"

type MaxHeap []int

func (h MaxHeap) Len() int {
	return len(h)
}

func (h MaxHeap) Less(i, j int) bool {
	return h[i] > h[j]
}

func (h MaxHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *MaxHeap) Push(x any) {
	*h = append(*h, x.(int))
}

func (h *MaxHeap) Pop() any {
	old := *h
	n := len(old)
	x := old[n-1]
	*h = old[:n-1]
	return x
}

func HeapSelect(a []int, k int) int {
	h := &MaxHeap{}
	heap.Init(h)

	for _, x := range a {
		heap.Push(h, x)

		if h.Len() > k+1 {
			heap.Pop(h)
		}
	}

	return (*h)[0]
}