Skip to content

Floyd Sampling

Select k distinct integers uniformly from a fixed range without shuffling the whole range.

Floyd Sampling selects kk distinct integers uniformly from the range 00 to n1n - 1. It avoids building and shuffling the full range, so it is useful when kk is much smaller than nn.

The algorithm inserts one value per iteration into a set. Each insertion preserves uniformity over all subsets of the processed range.

Problem

Given integers nn and kk with:

0kn 0 \le k \le n

return a uniformly random subset of size kk from:

0,1,2,,n1 {0, 1, 2, \dots, n - 1}

Algorithm

Iterate from nkn-k to n1n-1. At step jj, choose a random integer tt from 00 to jj. If tt is already selected, insert jj instead. Otherwise insert tt.

floyd_sampling(n, k):
    S = empty set

    for j from n - k to n - 1:
        t = random integer from 0 to j

        if t in S:
            insert j into S
        else:
            insert t into S

    return S

Example

Let:

n=10 n = 10

and:

k=3 k = 3

The loop runs for:

j=7,8,9 j = 7, 8, 9

After three insertions, the set contains three distinct values from 00 through 99. Each 3-element subset has the same probability.

Correctness

After processing step jj, the set contains exactly:

j(nk)+1 j - (n-k) + 1

items selected uniformly from:

0,1,,j {0, 1, \dots, j}

At step jj, the random choice tt gives every value in the current range an equal chance to be inserted. If tt was already selected, inserting jj preserves the size and avoids duplicates. This replacement rule maintains uniform probability over all subsets of the required size.

By induction, after the final step j=n1j = n - 1, the set is a uniform kk-subset of the full range.

Complexity

measurecost
timeO(k)O(k)
memoryO(k)O(k)
random choiceskk

This is better than shuffling when knk \ll n.

When to Use

Use Floyd Sampling when:

  • you need distinct random indices
  • the sample size is small relative to the range
  • building the full range would waste memory
  • order of the sample does not matter

For sampling actual stream items of unknown length, use Reservoir Sampling instead.

Implementation

import random

def floyd_sampling(n, k):
    selected = set()

    for j in range(n - k, n):
        t = random.randint(0, j)

        if t in selected:
            selected.add(j)
        else:
            selected.add(t)

    return selected
import "math/rand"

func FloydSampling(n, k int) map[int]struct{} {
	selected := make(map[int]struct{}, k)

	for j := n - k; j < n; j++ {
		t := rand.Intn(j + 1)

		if _, ok := selected[t]; ok {
			selected[j] = struct{}{}
		} else {
			selected[t] = struct{}{}
		}
	}

	return selected
}