# Heap Select

# Heap Select

Heap Select finds an order statistic by using a heap to keep only the part of the input that can still contain the answer.

For the k-th smallest element, it is common to keep a max heap of size $k + 1$. The heap stores the smallest values seen so far. Whenever a new value is smaller than the heap maximum, the maximum is removed and the new value is inserted. After the scan finishes, the heap maximum is the k-th smallest value.

## Problem

Given an array $A$ of length $n$ and an integer $k$ with $0 \le k < n$, return the k-th smallest element.

## Algorithm

Maintain a max heap of size at most $k + 1$.

```id="hs1"
heap_select(A, k):
    H = empty max heap

    for x in A:
        push x into H

        if size(H) > k + 1:
            pop maximum from H

    return maximum(H)
```

This keeps the smallest $k + 1$ elements seen so far. The largest among them has rank $k$.

## Example

Let

$$
A = [7, 2, 9, 4, 3]
$$

and

$$
k = 2
$$

The three smallest values are:

$$
[2, 3, 4]
$$

The largest of these is $4$, so the answer is:

$$
4
$$

## Correctness

After processing each prefix of the array, the heap contains the smallest $k + 1$ elements from that prefix, or all elements if fewer than $k + 1$ have been seen. This invariant holds because every time the heap grows too large, the largest candidate is removed.

At the end, the heap contains the smallest $k + 1$ elements of the whole array. The maximum of these elements is exactly the k-th smallest element.

## Complexity

| operation      |          cost |
| -------------- | ------------: |
| heap size      |       $k + 1$ |
| each insertion |   $O(\log k)$ |
| total time     | $O(n \log k)$ |
| space          |        $O(k)$ |

For selecting the k-th largest, use a min heap of size $k + 1$ instead.

## When to Use

Use Heap Select when:

* $k$ is much smaller than $n$
* the input is streamed
* you need top-k values, not just one element
* predictable memory usage matters

Quickselect is usually faster for in-memory selection when only one rank is needed.

## Implementation

Python has a min heap, so this implementation stores negative values to simulate a max heap.

```id="hs2"
import heapq

def heap_select(a, k):
    heap = []

    for x in a:
        heapq.heappush(heap, -x)

        if len(heap) > k + 1:
            heapq.heappop(heap)

    return -heap[0]
```

```id="hs3"
package main

import "container/heap"

type MaxHeap []int

func (h MaxHeap) Len() int {
	return len(h)
}

func (h MaxHeap) Less(i, j int) bool {
	return h[i] > h[j]
}

func (h MaxHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *MaxHeap) Push(x any) {
	*h = append(*h, x.(int))
}

func (h *MaxHeap) Pop() any {
	old := *h
	n := len(old)
	x := old[n-1]
	*h = old[:n-1]
	return x
}

func HeapSelect(a []int, k int) int {
	h := &MaxHeap{}
	heap.Init(h)

	for _, x := range a {
		heap.Push(h, x)

		if h.Len() > k+1 {
			heap.Pop(h)
		}
	}

	return (*h)[0]
}
```

