Skip to content

Sample Sort Selection

Use sampling to approximate order statistics and refine selection efficiently.

Sample Sort Selection uses random sampling to approximate the position of the k-th element, then narrows the search to a smaller subarray.

The idea is to avoid full partitioning of the entire array by first estimating where the target lies.

Problem

Given an array AA of length nn and an index kk, find the k-th smallest element.

Algorithm

Take a random sample, sort it, and use it to estimate bounds around the desired rank.

sample_sort_selection(A, k):
    s = choose sample size
    S = random sample of size s from A
    sort S

    estimate position of k in S:
        i = floor(k * s / n)

    choose bounds:
        low = S[i - delta]
        high = S[i + delta]

    B = [x in A where low <= x <= high]

    return quickselect(B, k - count(x < low))

The parameter δ\delta controls how wide the interval is.

Key Idea

The sample approximates the distribution of the full array. The rank of the target in the sample gives an estimate of its value in the full array.

By selecting a range around this estimate, the algorithm reduces the problem size from nn to a much smaller subset BB.

Example

Let:

A=[7,2,9,4,3] A = [7, 2, 9, 4, 3]

Suppose the sample is:

[2,4,9] [2, 4, 9]

and we want k=2k = 2. The estimate suggests the target lies near value 44, so the algorithm narrows the search around that region and applies Quickselect only on that subset.

Correctness

If the sample is representative, the estimated rank position is close to the true rank. With high probability, the true k-th element lies within the selected bounds.

Even if the bounds are slightly off, the refinement step still includes the correct element with high probability when the sampling parameters are chosen properly.

Complexity

phasecost
samplingO(s)O(s)
sorting sampleO(slogs)O(s \log s)
filteringO(n)O(n)
final selection$O(B)$

With appropriate sample size, expected complexity is:

O(n) O(n)

but with smaller constant factors compared to full Quickselect in some cases.

When to Use

Use Sample Sort Selection when:

  • input is very large
  • approximate narrowing is effective
  • you want to reduce partition cost
  • randomized algorithms are acceptable

It is often used in parallel and external-memory settings.

Implementation

import random

def sample_sort_selection(a, k, sample_size=50):
    n = len(a)
    s = min(sample_size, n)

    sample = random.sample(a, s)
    sample.sort()

    i = int(k * s / n)

    delta = max(1, s // 10)

    low = sample[max(0, i - delta)]
    high = sample[min(s - 1, i + delta)]

    below = [x for x in a if x < low]
    middle = [x for x in a if low <= x <= high]

    if k < len(below):
        return sample_sort_selection(below, k, sample_size)

    if k < len(below) + len(middle):
        return quickselect(middle, k - len(below))

    above = [x for x in a if x > high]
    return sample_sort_selection(above, k - len(below) - len(middle), sample_size)
func SampleSortSelection(a []int, k int) int {
	// simplified version using random pivot fallback
	return Quickselect(a, k)
}