19.6 Shuffling

Shuffling converts an ordered collection into a uniformly random permutation.

19.6 Shuffling

Shuffling converts an ordered collection into a uniformly random permutation. This is a small operation with large consequences. A correct shuffle removes input order bias, supports randomized experiments, prepares data for training, randomizes work distribution, and provides a foundation for many randomized algorithms.

The central requirement is uniformity. For an array with n distinct elements, there are n! possible permutations. A correct shuffle must make every permutation equally likely. Many simple-looking shuffle algorithms fail this condition and introduce subtle bias.

Problem

You have an array:

[1, 2, 3, 4]

You want to reorder it randomly so that every possible ordering has the same probability.

For four distinct elements, there are:

4! = 24

possible permutations.

A correct shuffle gives each permutation probability:

1 / 24

The task is not merely to make the output “look random.” The task is to generate a uniform random permutation.

Solution: Fisher-Yates Shuffle

The standard solution is the Fisher-Yates shuffle, also known as the Knuth shuffle.

The algorithm works from the end of the array toward the beginning. At position i, choose a random index j from 0 through i, then swap A[i] and A[j].

import random
from typing import MutableSequence, TypeVar

T = TypeVar("T")

def shuffle(array: MutableSequence[T]) -> None:
    for i in range(len(array) - 1, 0, -1):
        j = random.randint(0, i)
        array[i], array[j] = array[j], array[i]

Example:

items = [1, 2, 3, 4]
shuffle(items)
print(items)

Possible output:

[3, 1, 4, 2]

The function mutates the array in place.

How It Works

At each step, the algorithm permanently fixes one position.

For an array:

A[0..n-1]

the first iteration chooses which element belongs in position:

n - 1

The next iteration chooses which remaining element belongs in position:

n - 2

The process continues until every position has been fixed.

For n = 4, the choices are:

Step Position Fixed Number of Choices
1 3 4
2 2 3
3 1 2
4 0 1

Total possible choice sequences:

4 × 3 × 2 × 1 = 24

This matches the number of possible permutations.

Because each choice is uniform among the remaining elements, every permutation receives exactly one choice sequence and therefore has equal probability.

Why the Range Matters

The most common bug is choosing j from the full array every time:

import random
from typing import MutableSequence, TypeVar

T = TypeVar("T")

def biased_shuffle(array: MutableSequence[T]) -> None:
    n = len(array)

    for i in range(n):
        j = random.randint(0, n - 1)
        array[i], array[j] = array[j], array[i]

This looks plausible, but it is biased.

For n = 3, the algorithm performs three swaps. Each swap has three possible choices, so there are:

3 × 3 × 3 = 27

possible choice sequences.

But there are only:

3! = 6

permutations.

Since 27 cannot be divided evenly by 6, some permutations must occur more often than others.

That is the entire failure in one line: the number of random choice sequences does not distribute evenly across the output permutations.

Testing Shuffle Bias

Uniformity is probabilistic, so a single run proves nothing. A practical test runs the shuffle many times and counts the output permutations.

import random
from collections import Counter
from itertools import permutations
from typing import Callable

def fisher_yates(array: list[int]) -> None:
    for i in range(len(array) - 1, 0, -1):
        j = random.randint(0, i)
        array[i], array[j] = array[j], array[i]

def bad_shuffle(array: list[int]) -> None:
    n = len(array)

    for i in range(n):
        j = random.randint(0, n - 1)
        array[i], array[j] = array[j], array[i]

def run_trials(
    shuffle_function: Callable[[list[int]], None],
    trials: int = 100_000,
) -> Counter[tuple[int, ...]]:
    counts: Counter[tuple[int, ...]] = Counter()

    for _ in range(trials):
        values = [1, 2, 3]
        shuffle_function(values)
        counts[tuple(values)] += 1

    return counts

def print_counts(counts: Counter[tuple[int, ...]]) -> None:
    for permutation in sorted(permutations([1, 2, 3])):
        print(permutation, counts[permutation])

print("Fisher-Yates")
print_counts(run_trials(fisher_yates))

print()
print("Biased")
print_counts(run_trials(bad_shuffle))

Representative output:

Fisher-Yates
(1, 2, 3) 16671
(1, 3, 2) 16608
(2, 1, 3) 16645
(2, 3, 1) 16640
(3, 1, 2) 16693
(3, 2, 1) 16743

Biased
(1, 2, 3) 14821
(1, 3, 2) 18519
(2, 1, 3) 18476
(2, 3, 1) 18591
(3, 1, 2) 14923
(3, 2, 1) 14670

The exact numbers change from run to run. The pattern remains: Fisher-Yates stays close to uniform, while the flawed algorithm favors some permutations.

Recipe: Inside-Out Shuffle

Sometimes the input arrives as a stream or iterator, and you want to build a shuffled array without first copying the input and then shuffling it.

The inside-out version of Fisher-Yates does that.

import random
from collections.abc import Iterable
from typing import TypeVar

T = TypeVar("T")

def inside_out_shuffle(items: Iterable[T]) -> list[T]:
    result: list[T] = []

    for i, item in enumerate(items):
        j = random.randint(0, i)

        if j == i:
            result.append(item)
        else:
            result.append(result[j])
            result[j] = item

    return result

Example:

print(inside_out_shuffle(["a", "b", "c", "d", "e"]))

Possible output:

['c', 'e', 'a', 'b', 'd']

This algorithm also produces a uniform random permutation. At step i, the new item is placed uniformly among the i + 1 possible positions, while the displaced item moves to the new end.

Application: Randomizing Training Data

Machine learning pipelines often shuffle training examples before creating mini-batches.

Without shuffling, adjacent examples may be correlated by time, source, label, or collection process.

Example:

1000 cat images
1000 dog images
1000 bird images

If mini-batches are built directly from this order, early batches contain only cats, middle batches only dogs, and late batches only birds. Gradient updates become biased by input order.

Shuffling first produces batches that better represent the full dataset.

def make_batches[T](items: list[T], batch_size: int) -> list[list[T]]:
    copied = items[:]
    shuffle(copied)

    return [
        copied[start:start + batch_size]
        for start in range(0, len(copied), batch_size)
    ]

For older Python versions that do not support generic function syntax, write the signature this way:

from typing import TypeVar

T = TypeVar("T")

def make_batches(items: list[T], batch_size: int) -> list[list[T]]:
    copied = items[:]
    shuffle(copied)

    return [
        copied[start:start + batch_size]
        for start in range(0, len(copied), batch_size)
    ]

The copy protects the original order. That matters when the same dataset is reused for evaluation, debugging, or reproducibility.

Application: Randomized Work Distribution

A scheduler may receive tasks in an order that correlates with cost.

Example:

[heavy, heavy, heavy, heavy, light, light, light, light]

Assigning tasks in order can overload early workers.

Shuffling before assignment reduces this risk:

from collections.abc import Sequence
from typing import TypeVar

T = TypeVar("T")

def distribute_round_robin(tasks: Sequence[T], worker_count: int) -> list[list[T]]:
    shuffled = list(tasks)
    shuffle(shuffled)

    workers: list[list[T]] = [[] for _ in range(worker_count)]

    for index, task in enumerate(shuffled):
        workers[index % worker_count].append(task)

    return workers

This does not guarantee perfect load balancing. It removes systematic bias caused by the original order.

Complexity

Fisher-Yates has optimal complexity.

Operation Cost
Time O(n)
Extra memory O(1)
Swaps n - 1
Random choices n - 1

The inside-out version also uses O(n) time, but it allocates a new array of size n.

Correctness Argument

We prove by induction that after fixing positions from the end, each suffix is uniformly random.

After the first iteration, every element has probability:

1 / n

of occupying position n - 1.

Assume positions:

i + 1, i + 2, ..., n - 1

already contain a uniformly random selection of elements. At step i, the algorithm chooses uniformly among the remaining i + 1 positions and swaps the chosen element into position i.

Therefore each remaining element has probability:

1 / (i + 1)

of being placed at position i.

The invariant continues until all positions are fixed. Each full permutation is produced with probability:

1 / n!

Common Mistakes

Do not sort by random keys as a default shuffle:

array.sort(key=lambda _: random.random())

This is slower, depends on sorting behavior, can collide if the key space is limited, and obscures the uniformity argument.

Do not swap each position with a random element from the whole array. That introduces bias.

Do not use modulo arithmetic carelessly when converting random integers into indices. If the random generator range is not divisible by the index range, modulo reduction introduces bias.

Do not use a predictable pseudorandom generator where adversaries can exploit the order. For security-sensitive shuffling, use a cryptographically secure generator.

Python Note

Python already provides a correct in-place shuffle:

import random

items = [1, 2, 3, 4]
random.shuffle(items)

For cryptographic contexts, use secrets to choose indices, or use a library designed for secure randomization.

import secrets
from typing import MutableSequence, TypeVar

T = TypeVar("T")

def secure_shuffle(array: MutableSequence[T]) -> None:
    for i in range(len(array) - 1, 0, -1):
        j = secrets.randbelow(i + 1)
        array[i], array[j] = array[j], array[i]

This follows the same Fisher-Yates structure but obtains random indices from a cryptographically stronger source.

Takeaway

Use Fisher-Yates when you need a uniform random permutation. Its range choice is the essential detail: at step i, choose from 0 through i, then swap. The algorithm is linear, in-place, simple to implement, and easy to prove correct. Avoid full-range repeated swapping and random-key sorting unless you have a specific reason and understand the bias and performance trade-offs.