Select k distinct integers uniformly from a fixed range without shuffling the whole range.
Floyd Sampling selects distinct integers uniformly from the range to . It avoids building and shuffling the full range, so it is useful when is much smaller than .
The algorithm inserts one value per iteration into a set. Each insertion preserves uniformity over all subsets of the processed range.
Problem
Given integers and with:
return a uniformly random subset of size from:
Algorithm
Iterate from to . At step , choose a random integer from to . If is already selected, insert instead. Otherwise insert .
floyd_sampling(n, k):
S = empty set
for j from n - k to n - 1:
t = random integer from 0 to j
if t in S:
insert j into S
else:
insert t into S
return SExample
Let:
and:
The loop runs for:
After three insertions, the set contains three distinct values from through . Each 3-element subset has the same probability.
Correctness
After processing step , the set contains exactly:
items selected uniformly from:
At step , the random choice gives every value in the current range an equal chance to be inserted. If was already selected, inserting preserves the size and avoids duplicates. This replacement rule maintains uniform probability over all subsets of the required size.
By induction, after the final step , the set is a uniform -subset of the full range.
Complexity
| measure | cost |
|---|---|
| time | |
| memory | |
| random choices |
This is better than shuffling when .
When to Use
Use Floyd Sampling when:
- you need distinct random indices
- the sample size is small relative to the range
- building the full range would waste memory
- order of the sample does not matter
For sampling actual stream items of unknown length, use Reservoir Sampling instead.
Implementation
import random
def floyd_sampling(n, k):
selected = set()
for j in range(n - k, n):
t = random.randint(0, j)
if t in selected:
selected.add(j)
else:
selected.add(t)
return selectedimport "math/rand"
func FloydSampling(n, k int) map[int]struct{} {
selected := make(map[int]struct{}, k)
for j := n - k; j < n; j++ {
t := rand.Intn(j + 1)
if _, ok := selected[t]; ok {
selected[j] = struct{}{}
} else {
selected[t] = struct{}{}
}
}
return selected
}