# Floyd Sampling

# Floyd Sampling

Floyd Sampling selects $k$ distinct integers uniformly from the range $0$ to $n - 1$. It avoids building and shuffling the full range, so it is useful when $k$ is much smaller than $n$.

The algorithm inserts one value per iteration into a set. Each insertion preserves uniformity over all subsets of the processed range.

## Problem

Given integers $n$ and $k$ with:

$$
0 \le k \le n
$$

return a uniformly random subset of size $k$ from:

$$
{0, 1, 2, \dots, n - 1}
$$

## Algorithm

Iterate from $n-k$ to $n-1$. At step $j$, choose a random integer $t$ from $0$ to $j$. If $t$ is already selected, insert $j$ instead. Otherwise insert $t$.

```id="fs1"
floyd_sampling(n, k):
    S = empty set

    for j from n - k to n - 1:
        t = random integer from 0 to j

        if t in S:
            insert j into S
        else:
            insert t into S

    return S
```

## Example

Let:

$$
n = 10
$$

and:

$$
k = 3
$$

The loop runs for:

$$
j = 7, 8, 9
$$

After three insertions, the set contains three distinct values from $0$ through $9$. Each 3-element subset has the same probability.

## Correctness

After processing step $j$, the set contains exactly:

$$
j - (n-k) + 1
$$

items selected uniformly from:

$$
{0, 1, \dots, j}
$$

At step $j$, the random choice $t$ gives every value in the current range an equal chance to be inserted. If $t$ was already selected, inserting $j$ preserves the size and avoids duplicates. This replacement rule maintains uniform probability over all subsets of the required size.

By induction, after the final step $j = n - 1$, the set is a uniform $k$-subset of the full range.

## Complexity

| measure        |   cost |
| -------------- | -----: |
| time           | $O(k)$ |
| memory         | $O(k)$ |
| random choices |    $k$ |

This is better than shuffling when $k \ll n$.

## When to Use

Use Floyd Sampling when:

* you need distinct random indices
* the sample size is small relative to the range
* building the full range would waste memory
* order of the sample does not matter

For sampling actual stream items of unknown length, use Reservoir Sampling instead.

## Implementation

```id="fs2"
import random

def floyd_sampling(n, k):
    selected = set()

    for j in range(n - k, n):
        t = random.randint(0, j)

        if t in selected:
            selected.add(j)
        else:
            selected.add(t)

    return selected
```

```id="fs3"
import "math/rand"

func FloydSampling(n, k int) map[int]struct{} {
	selected := make(map[int]struct{}, k)

	for j := n - k; j < n; j++ {
		t := rand.Intn(j + 1)

		if _, ok := selected[t]; ok {
			selected[j] = struct{}{}
		} else {
			selected[t] = struct{}{}
		}
	}

	return selected
}
```

