# 6.11 Partial Sorting

# 6.11 Partial Sorting

Partial sorting orders only the part of the array that the caller needs. Instead of fully sorting all `n` elements, the algorithm produces the smallest `k` elements in sorted order, or arranges the array so that the first `k` elements are the desired subset.

This is useful when a full order is wasteful. Search results, leaderboards, nearest-neighbor candidates, and reporting queries often need only the top few entries.

## Problem

You have an array `A[0..n-1]` and an integer `k`. You want the smallest `k` elements, preferably faster than sorting the whole array.

## Solution

There are two common forms.

The first form returns the smallest `k` elements sorted:

```text id="rl41f2"
partial_sort(A, k):
  find the smallest k elements
  sort those k elements
  return them
```

The second form partitions the array so that the first `k` elements are the smallest `k`, but not necessarily sorted:

```text id="fl0psx"
nth_partition(A, k):
  rearrange A so that:
    A[0..k-1] contains the k smallest elements
    A[k..n-1] contains the remaining elements
```

The first form is useful for presentation. The second form is useful as an internal step before later processing.

## Heap-Based Partial Sort

A practical method uses a max-heap of size `k`.

```text id="ggq8f7"
top_k_smallest(A, k):
  heap = empty max-heap

  for each x in A:
    if size(heap) < k:
      push x into heap
    else if x < max(heap):
      pop max(heap)
      push x into heap

  return elements of heap sorted increasingly
```

The heap stores the best `k` candidates seen so far. The largest element in the heap is the current cutoff. If a new value is greater than or equal to that cutoff, it cannot belong to the smallest `k` elements.

## Invariant

After processing the first `i` elements, the heap contains the smallest `min(i, k)` elements among those `i` elements.

Initially the heap is empty, so the invariant holds.

When fewer than `k` elements have been processed, the next element must be inserted because all seen elements belong to the current candidate set.

When the heap already has `k` elements, compare the new element `x` with the heap maximum. If `x` is not smaller, the heap remains the smallest `k` elements. If `x` is smaller, the current maximum cannot remain in the smallest `k`, so it is removed and replaced by `x`.

At the end, the heap contains exactly the smallest `k` elements of the full array.

## Complexity

The heap has size at most `k`. Each insertion or replacement costs:

```text id="xxhq8i"
O(log k)
```

Processing `n` elements costs:

```text id="cepbhg"
O(n log k)
```

Sorting the final heap costs:

```text id="zj8y0z"
O(k log k)
```

Total time:

```text id="ocrm5e"
O(n log k + k log k)
```

When `k` is much smaller than `n`, this is better than full sorting:

```text id="4pkcl3"
O(n log n)
```

The extra space usage is:

```text id="ham30g"
O(k)
```

## Partition-Based Partial Sort

Another method uses quickselect-style partitioning to place the `k`th smallest element into its final position.

```text id="9k0yjq"
partial_sort_by_partition(A, k):
  quickselect(A, 0, length(A), k)
  sort A[0..k-1]
  return A[0..k-1]
```

After `quickselect`, every element in `A[0..k-1]` is less than or equal to every element in `A[k..n-1]`, but the first part may not be sorted. Sorting only that prefix gives the smallest `k` elements in order.

Expected time with randomized partitioning:

```text id="3oy9o9"
O(n + k log k)
```

Worst-case time without safeguards:

```text id="vrxxud"
O(n^2)
```

## Correctness

For the heap method, correctness follows from the candidate invariant: after every step, the heap stores the best `k` elements seen so far. Once all elements have been scanned, "seen so far" means the entire array.

For the partition method, quickselect guarantees that no element after index `k - 1` is smaller than an element before index `k`. Sorting the first `k` elements therefore returns exactly the smallest `k` elements in sorted order.

Both methods preserve the permutation condition if implemented by moves or swaps. The returned subset contains elements from the input, not newly created values.

## Choosing a Method

Use the heap method when data arrives as a stream or when you cannot mutate the input. It keeps only `k` elements and can emit a result after one scan.

Use the partition method when the input is stored in an array and mutation is allowed. It is usually faster when `k` is not extremely small.

Use full sorting when `k` is close to `n`, because the overhead of special handling may no longer pay for itself.

## Common Bugs

A common bug is using a min-heap for the smallest `k` elements. A min-heap exposes the smallest candidate, but the algorithm needs to compare new elements against the largest accepted candidate. For smallest `k`, use a max-heap of size `k`.

Another bug is forgetting that partitioning does not sort the prefix. After quickselect, `A[0..k-1]` contains the right elements but may appear in arbitrary order.

A third bug is mishandling `k = 0` or `k > n`. A robust implementation should define these boundary cases explicitly.

## Takeaway

Partial sorting avoids unnecessary work when only a prefix of the sorted order is needed. Heap-based methods favor streaming and immutability. Partition-based methods favor in-place array processing and lower expected runtime.

