6.11 Partial Sorting

Partial sorting orders only the part of the array that the caller needs. Instead of fully sorting all n elements, the algorithm produces the smallest k elements in sorted order, or arranges the array so that the first k elements are the desired subset.

This is useful when a full order is wasteful. Search results, leaderboards, nearest-neighbor candidates, and reporting queries often need only the top few entries.

Problem

You have an array A[0..n-1] and an integer k. You want the smallest k elements, preferably faster than sorting the whole array.

Solution

There are two common forms.

The first form returns the smallest k elements sorted:

partial_sort(A, k):
  find the smallest k elements
  sort those k elements
  return them

The second form partitions the array so that the first k elements are the smallest k, but not necessarily sorted:

nth_partition(A, k):
  rearrange A so that:
    A[0..k-1] contains the k smallest elements
    A[k..n-1] contains the remaining elements

The first form is useful for presentation. The second form is useful as an internal step before later processing.

Heap-Based Partial Sort

A practical method uses a max-heap of size k.

top_k_smallest(A, k):
  heap = empty max-heap

  for each x in A:
    if size(heap) < k:
      push x into heap
    else if x < max(heap):
      pop max(heap)
      push x into heap

  return elements of heap sorted increasingly

The heap stores the best k candidates seen so far. The largest element in the heap is the current cutoff. If a new value is greater than or equal to that cutoff, it cannot belong to the smallest k elements.

Invariant

After processing the first i elements, the heap contains the smallest min(i, k) elements among those i elements.

Initially the heap is empty, so the invariant holds.

When fewer than k elements have been processed, the next element must be inserted because all seen elements belong to the current candidate set.

When the heap already has k elements, compare the new element x with the heap maximum. If x is not smaller, the heap remains the smallest k elements. If x is smaller, the current maximum cannot remain in the smallest k, so it is removed and replaced by x.

At the end, the heap contains exactly the smallest k elements of the full array.

Complexity

The heap has size at most k. Each insertion or replacement costs:

O(log k)

Processing n elements costs:

O(n log k)

Sorting the final heap costs:

O(k log k)

Total time:

O(n log k + k log k)

When k is much smaller than n, this is better than full sorting:

O(n log n)

The extra space usage is:

O(k)

Partition-Based Partial Sort

Another method uses quickselect-style partitioning to place the kth smallest element into its final position.

partial_sort_by_partition(A, k):
  quickselect(A, 0, length(A), k)
  sort A[0..k-1]
  return A[0..k-1]

After quickselect, every element in A[0..k-1] is less than or equal to every element in A[k..n-1], but the first part may not be sorted. Sorting only that prefix gives the smallest k elements in order.

Expected time with randomized partitioning:

O(n + k log k)

Worst-case time without safeguards:

O(n^2)

Correctness

For the heap method, correctness follows from the candidate invariant: after every step, the heap stores the best k elements seen so far. Once all elements have been scanned, “seen so far” means the entire array.

For the partition method, quickselect guarantees that no element after index k - 1 is smaller than an element before index k. Sorting the first k elements therefore returns exactly the smallest k elements in sorted order.

Both methods preserve the permutation condition if implemented by moves or swaps. The returned subset contains elements from the input, not newly created values.

Choosing a Method

Use the heap method when data arrives as a stream or when you cannot mutate the input. It keeps only k elements and can emit a result after one scan.

Use the partition method when the input is stored in an array and mutation is allowed. It is usually faster when k is not extremely small.

Use full sorting when k is close to n, because the overhead of special handling may no longer pay for itself.

Common Bugs

A common bug is using a min-heap for the smallest k elements. A min-heap exposes the smallest candidate, but the algorithm needs to compare new elements against the largest accepted candidate. For smallest k, use a max-heap of size k.

Another bug is forgetting that partitioning does not sort the prefix. After quickselect, A[0..k-1] contains the right elements but may appear in arbitrary order.

A third bug is mishandling k = 0 or k > n. A robust implementation should define these boundary cases explicitly.

Takeaway

Partial sorting avoids unnecessary work when only a prefix of the sorted order is needed. Heap-based methods favor streaming and immutability. Partition-based methods favor in-place array processing and lower expected runtime.