18.8 Sieve of Eratosthenes

The Sieve of Eratosthenes is the standard algorithm for generating all primes up to a limit.

18.8 Sieve of Eratosthenes

The Sieve of Eratosthenes is the standard algorithm for generating all primes up to a limit. Instead of testing each number independently, it eliminates composites in bulk. This makes it much faster than repeated trial division when many primes are needed.

The sieve is useful whenever an algorithm needs a table of primes, fast primality lookup for bounded values, or precomputed arithmetic information. It appears in factorization, combinatorics, prime counting, Euler phi computation, Möbius computation, and many programming-contest problems.

Problem

Given an integer:

n

generate all prime numbers less than or equal to n.

Example:

n = 30

Output:

2 3 5 7 11 13 17 19 23 29

A naive approach would test each number using trial division. That works, but it repeats work. The sieve avoids repeated factor checks by marking known composites.

Solution

Create a Boolean table where is_prime[x] means that x is still believed to be prime. Initially, mark every number from 2 to n as prime. Then scan from left to right. Whenever you find a prime p, mark all multiples of p as composite.

def sieve(n):
    if n < 2:
        return []

    is_prime = [True] * (n + 1)
    is_prime[0] = False
    is_prime[1] = False

    p = 2

    while p * p <= n:
        if is_prime[p]:
            for multiple in range(p * p, n + 1, p):
                is_prime[multiple] = False

        p += 1

    primes = []

    for x in range(2, n + 1):
        if is_prime[x]:
            primes.append(x)

    return primes

Example:

print(sieve(30))

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Why Start at p * p?

When processing a prime p, every smaller multiple of p has already been marked by a smaller factor.

For example, when p = 5, the multiples below 25 are:

5 × 2 = 10
5 × 3 = 15
5 × 4 = 20

Each has already been marked:

10 by 2
15 by 3
20 by 2

The first new multiple that 5 needs to mark is:

5 × 5 = 25

So the marking loop begins at:

p * p

This small detail is important for both correctness clarity and performance.

Step-by-Step Example

For n = 30, start with all numbers from 2 to 30 marked as possible primes.

Process 2. Mark:

4 6 8 10 12 14 16 18 20 22 24 26 28 30

Process 3. Mark:

9 12 15 18 21 24 27 30

Some are already marked, which is fine.

Process 5. Start at 25:

25 30

Now the next candidate is 7, but:

7 * 7 = 49 > 30

So no further marking is needed.

The unmarked numbers are prime:

2 3 5 7 11 13 17 19 23 29

Correctness

The sieve relies on the same divisor-pair observation used in primality testing: every composite number x has a prime factor less than or equal to sqrt(x).

When the sieve processes that prime factor p, it marks x as a multiple of p. Therefore every composite number up to n is eventually marked.

Conversely, the sieve only marks multiples of primes. A prime number has no divisor other than 1 and itself, so it is never marked by a smaller prime. It remains unmarked and is returned as prime.

Complexity

The sieve is much faster than checking each number independently.

Time complexity:

O(n log log n)

Space complexity:

O(n)

The log log n factor grows extremely slowly, so the sieve is effectively close to linear for many practical input sizes.

Repeated trial division up to sqrt(x) for every x <= n is much more expensive:

O(n sqrt(n))

The sieve is the right default when you need many primes in a bounded range.

Returning a Primality Table

Sometimes you do not need the list of primes. You need constant-time primality queries.

def prime_table(n):
    is_prime = [True] * (n + 1)

    if n >= 0:
        is_prime[0] = False

    if n >= 1:
        is_prime[1] = False

    p = 2

    while p * p <= n:
        if is_prime[p]:
            for multiple in range(p * p, n + 1, p):
                is_prime[multiple] = False

        p += 1

    return is_prime

Example:

is_prime = prime_table(100)

print(is_prime[97])
print(is_prime[91])

Output:

True
False

This is useful when many later queries ask whether a number is prime.

Storing the Smallest Prime Factor

A common extension stores the smallest prime factor of each integer. This supports fast factorization after preprocessing.

def smallest_prime_factor(n):
    spf = list(range(n + 1))

    if n >= 0:
        spf[0] = 0

    if n >= 1:
        spf[1] = 1

    p = 2

    while p * p <= n:
        if spf[p] == p:
            for multiple in range(p * p, n + 1, p):
                if spf[multiple] == multiple:
                    spf[multiple] = p

        p += 1

    return spf

Using the table:

def factorize(x, spf):
    factors = []

    while x > 1:
        p = spf[x]
        count = 0

        while x % p == 0:
            x //= p
            count += 1

        factors.append((p, count))

    return factors

Example:

spf = smallest_prime_factor(100)
print(factorize(84, spf))

Output:

[(2, 2), (3, 1), (7, 1)]

This represents:

84 = 2^2 × 3 × 7

The preprocessing costs O(n log log n), but each factorization is then fast.

Linear Sieve

The classic sieve may mark the same composite number multiple times. A linear sieve marks each composite once by its smallest prime factor.

def linear_sieve(n):
    primes = []
    spf = [0] * (n + 1)

    for x in range(2, n + 1):
        if spf[x] == 0:
            spf[x] = x
            primes.append(x)

        for p in primes:
            if p > spf[x] or x * p > n:
                break

            spf[x * p] = p

    return primes, spf

Complexity:

O(n)

The linear sieve is especially useful when you need smallest prime factors, phi values, Möbius values, or other multiplicative-function tables.

For simple prime listing, the classic sieve is usually easier and fast enough.

Segmented Sieve

A normal sieve requires O(n) memory. When n is too large, this becomes a problem. A segmented sieve processes the range in blocks.

The idea is:

  1. Generate all base primes up to sqrt(n).
  2. Split [2, n] into smaller intervals.
  3. Use the base primes to mark composites inside each interval.

Sketch:

from math import isqrt

def segmented_sieve(n, block_size=32768):
    if n < 2:
        return []

    base_primes = sieve(isqrt(n))
    primes = []

    low = 2

    while low <= n:
        high = min(low + block_size - 1, n)
        block = [True] * (high - low + 1)

        for p in base_primes:
            start = max(p * p, ((low + p - 1) // p) * p)

            for multiple in range(start, high + 1, p):
                block[multiple - low] = False

        for i, value in enumerate(block):
            if value:
                primes.append(low + i)

        low = high + 1

    return primes

The segmented sieve is useful when the range is large but memory must stay bounded.

Sieve for an Interval

Sometimes the task is to find primes in [L, R] where R is large but the interval length is moderate.

from math import isqrt

def primes_in_range(left, right):
    if right < 2 or left > right:
        return []

    base_primes = sieve(isqrt(right))
    block = [True] * (right - left + 1)

    for p in base_primes:
        start = max(p * p, ((left + p - 1) // p) * p)

        for multiple in range(start, right + 1, p):
            block[multiple - left] = False

    result = []

    for i, value in enumerate(block):
        x = left + i

        if x >= 2 and value:
            result.append(x)

    return result

Example:

print(primes_in_range(100, 130))

Output:

[101, 103, 107, 109, 113, 127]

This pattern is common when R is too large for a full table but R - L is manageable.

Common Mistakes

Starting at 2p Instead of p * p

Starting at 2 * p is correct but slower. Multiples below p * p were already marked by smaller primes.

Forgetting 0 and 1

Neither 0 nor 1 is prime. Mark them explicitly.

Using Floating-Point Square Roots

Use integer arithmetic for bounds:

while p * p <= n:
    ...

or use isqrt.

Allocating Too Much Memory

A list of n + 1 booleans can be too large when n is very large. Use a segmented sieve for large ranges.

Confusing SPF with Primality

In an SPF table, spf[x] == x usually means x is prime. In a Boolean sieve, is_prime[x] directly stores primality. Keep the table semantics explicit.

Testing

Basic tests:

def test_sieve():
    assert sieve(1) == []
    assert sieve(2) == [2]
    assert sieve(10) == [2, 3, 5, 7]
    assert sieve(30) == [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Cross-check with trial division:

def test_sieve_against_trial_division():
    primes = sieve(500)
    table = set(primes)

    for x in range(1, 501):
        assert (x in table) == is_prime(x)

For segmented sieves, compare against the ordinary sieve for small ranges before using large inputs.

Takeaway

The Sieve of Eratosthenes generates primes by marking composite multiples instead of testing each number independently. Starting each marking loop at p * p avoids redundant work, and the resulting algorithm runs in O(n log log n) time with O(n) memory. Variants such as smallest-prime-factor sieves, linear sieves, and segmented sieves extend the same idea to fast factorization, multiplicative-function tables, and large interval queries.