Skip to content

6.11 Partial Sorting

Produce only the smallest k elements in sorted order rather than sorting the entire array, reducing unnecessary work when the full order is not needed.

Partial sorting orders only the part of the array that the caller needs. Instead of fully sorting all n elements, the algorithm produces the smallest k elements in sorted order, or arranges the array so that the first k elements are the desired subset.

This is useful when a full order is wasteful. Search results, leaderboards, nearest-neighbor candidates, and reporting queries often need only the top few entries.

Problem

You have an array A[0..n-1] and an integer k. You want the smallest k elements, preferably faster than sorting the whole array.

Solution

There are two common forms.

The first form returns the smallest k elements sorted:

partial_sort(A, k):
  find the smallest k elements
  sort those k elements
  return them

The second form partitions the array so that the first k elements are the smallest k, but not necessarily sorted:

nth_partition(A, k):
  rearrange A so that:
    A[0..k-1] contains the k smallest elements
    A[k..n-1] contains the remaining elements

The first form is useful for presentation. The second form is useful as an internal step before later processing.

Heap-Based Partial Sort

A practical method uses a max-heap of size k.

top_k_smallest(A, k):
  heap = empty max-heap

  for each x in A:
    if size(heap) < k:
      push x into heap
    else if x < max(heap):
      pop max(heap)
      push x into heap

  return elements of heap sorted increasingly

The heap stores the best k candidates seen so far. The largest element in the heap is the current cutoff. If a new value is greater than or equal to that cutoff, it cannot belong to the smallest k elements.

Invariant

After processing the first i elements, the heap contains the smallest min(i, k) elements among those i elements.

Initially the heap is empty, so the invariant holds.

When fewer than k elements have been processed, the next element must be inserted because all seen elements belong to the current candidate set.

When the heap already has k elements, compare the new element x with the heap maximum. If x is not smaller, the heap remains the smallest k elements. If x is smaller, the current maximum cannot remain in the smallest k, so it is removed and replaced by x.

At the end, the heap contains exactly the smallest k elements of the full array.

Complexity

The heap has size at most k. Each insertion or replacement costs:

O(log k)

Processing n elements costs:

O(n log k)

Sorting the final heap costs:

O(k log k)

Total time:

O(n log k + k log k)

When k is much smaller than n, this is better than full sorting:

O(n log n)

The extra space usage is:

O(k)

Partition-Based Partial Sort

Another method uses quickselect-style partitioning to place the kth smallest element into its final position.

partial_sort_by_partition(A, k):
  quickselect(A, 0, length(A), k)
  sort A[0..k-1]
  return A[0..k-1]

After quickselect, every element in A[0..k-1] is less than or equal to every element in A[k..n-1], but the first part may not be sorted. Sorting only that prefix gives the smallest k elements in order.

Expected time with randomized partitioning:

O(n + k log k)

Worst-case time without safeguards:

O(n^2)

Correctness

For the heap method, correctness follows from the candidate invariant: after every step, the heap stores the best k elements seen so far. Once all elements have been scanned, “seen so far” means the entire array.

For the partition method, quickselect guarantees that no element after index k - 1 is smaller than an element before index k. Sorting the first k elements therefore returns exactly the smallest k elements in sorted order.

Both methods preserve the permutation condition if implemented by moves or swaps. The returned subset contains elements from the input, not newly created values.

Choosing a Method

Use the heap method when data arrives as a stream or when you cannot mutate the input. It keeps only k elements and can emit a result after one scan.

Use the partition method when the input is stored in an array and mutation is allowed. It is usually faster when k is not extremely small.

Use full sorting when k is close to n, because the overhead of special handling may no longer pay for itself.

Common Bugs

A common bug is using a min-heap for the smallest k elements. A min-heap exposes the smallest candidate, but the algorithm needs to compare new elements against the largest accepted candidate. For smallest k, use a max-heap of size k.

Another bug is forgetting that partitioning does not sort the prefix. After quickselect, A[0..k-1] contains the right elements but may appear in arbitrary order.

A third bug is mishandling k = 0 or k > n. A robust implementation should define these boundary cases explicitly.

Takeaway

Partial sorting avoids unnecessary work when only a prefix of the sorted order is needed. Heap-based methods favor streaming and immutability. Partition-based methods favor in-place array processing and lower expected runtime.