18.24 Testing Number-Theoretic Code

Number-theoretic code is compact, but its edge cases are dense.

18.24 Testing Number-Theoretic Code

Number-theoretic code is compact, but its edge cases are dense. A function may work for most random inputs and still fail on zero, one, negative values, non-coprime moduli, repeated prime factors, overflow boundaries, or composite numbers that imitate primes.

Testing these algorithms requires more than checking a few examples. Good tests combine known values, algebraic identities, brute-force oracles for small inputs, randomized comparison, and reconstruction checks.

Problem

Given implementations of number-theoretic algorithms, verify correctness across ordinary cases and boundary cases.

Examples include:

gcd(a,b)
mod_inverse(a,m)
is_prime(n)
factorize(n)
crt(remainders, moduli)
phi(n)
mobius(n)

The goal is to catch both mathematical mistakes and implementation mistakes.

Use Known Values First

Start with hand-checkable examples.

from math import gcd

def test_gcd_known_values():
    assert gcd(48, 18) == 6
    assert gcd(17, 13) == 1
    assert gcd(0, 7) == 7
    assert gcd(7, 0) == 7

Known values catch obvious errors and document expected behavior.

For Euler phi:

def test_phi_known_values():
    assert phi(1) == 1
    assert phi(2) == 1
    assert phi(5) == 4
    assert phi(8) == 4
    assert phi(9) == 6
    assert phi(36) == 12

For Möbius:

def test_mobius_known_values():
    assert mobius(1) == 1
    assert mobius(2) == -1
    assert mobius(4) == 0
    assert mobius(6) == 1
    assert mobius(12) == 0
    assert mobius(30) == -1

These tests are small but valuable. They anchor the semantics.

Test Boundary Values

Most bugs live near boundary values.

Use inputs such as:

0
1
2
m - 1
m
m + 1
negative values
prime values
prime powers
perfect squares
large composites

Example for primality:

def test_primality_boundaries():
    assert not is_prime_64(-7)
    assert not is_prime_64(0)
    assert not is_prime_64(1)

    assert is_prime_64(2)
    assert is_prime_64(3)

    assert not is_prime_64(4)
    assert not is_prime_64(9)
    assert not is_prime_64(25)

For modular inverse:

def test_inverse_boundaries():
    assert mod_inverse(3, 11) == 4
    assert mod_inverse(10, 17) == 12

    try:
        mod_inverse(2, 8)
        assert False
    except ValueError:
        pass

Boundary tests force you to define the behavior instead of leaving it accidental.

Use Brute Force as an Oracle

For small inputs, simple brute force is often the best correctness oracle.

A production algorithm may be sophisticated, but a tiny brute-force implementation is easier to trust.

Example: primality.

def is_prime_bruteforce(n):
    if n < 2:
        return False

    for d in range(2, n):
        if n % d == 0:
            return False

    return True

Compare:

def test_prime_against_bruteforce():
    for n in range(1, 1000):
        assert is_prime_64(n) == is_prime_bruteforce(n)

Example: linear congruences.

def brute_linear_congruence(a, b, m):
    return [x for x in range(m) if (a * x - b) % m == 0]

Compare:

def test_linear_congruence_against_brute_force():
    for m in range(1, 50):
        for a in range(-50, 51):
            for b in range(-50, 51):
                assert solve_linear_congruence(a, b, m) == brute_linear_congruence(a, b, m)

This style catches sign errors, normalization mistakes, and missing multiple-solution cases.

Test Algebraic Identities

Number theory gives you many invariants for free.

For GCD:

gcd(a,b) divides a and b
def test_gcd_divides_inputs():
    for a in range(1, 200):
        for b in range(1, 200):
            g = gcd(a, b)

            assert a % g == 0
            assert b % g == 0

For Extended Euclid:

ax + by = gcd(a,b)
def test_extended_gcd_identity():
    for a in range(-100, 101):
        for b in range(-100, 101):
            if a == 0 and b == 0:
                continue

            g, x, y = extended_gcd(a, b)

            assert a * x + b * y == g
            assert g == gcd(a, b)

For modular inverse:

a × a⁻¹ ≡ 1 (mod m)
def test_mod_inverse_identity():
    for m in range(2, 100):
        for a in range(1, m):
            if gcd(a, m) == 1:
                inv = mod_inverse(a, m)
                assert (a * inv) % m == 1

Algebraic identities are excellent because they test behavior instead of a single expected answer.

Reconstruction Tests

For factorization, the best invariant is reconstruction.

def reconstruct(factors):
    result = 1

    for p in factors:
        result *= p

    return result

Test:

def test_factorization_reconstructs_input():
    for n in range(2, 10000):
        factors = factorize(n)

        assert reconstruct(factors) == n

        for p in factors:
            assert is_prime_64(p)

For exponent-count factorization:

def reconstruct_from_powers(factors):
    result = 1

    for p, e in factors:
        result *= p ** e

    return result

Test:

def test_power_factorization_reconstructs_input():
    for n in range(2, 10000):
        factors = factorize_with_exponents(n)

        assert reconstruct_from_powers(factors) == n

        for p, e in factors:
            assert is_prime_64(p)
            assert e > 0

This catches missing final factors, wrong multiplicities, and accidental composite factors.

Test CRT by Substitution

A CRT solution must satisfy every original congruence.

def test_crt_substitution():
    remainders = [2, 3, 2]
    moduli = [3, 5, 7]

    x, m = crt(remainders, moduli)

    assert m == 105

    for a, mod in zip(remainders, moduli):
        assert x % mod == a

For randomized small cases:

def test_crt_random_small():
    cases = [
        ([0, 1], [2, 3]),
        ([1, 2, 3], [5, 7, 11]),
        ([4, 5], [9, 10]),
    ]

    for remainders, moduli in cases:
        x, m = crt(remainders, moduli)

        for a, mod in zip(remainders, moduli):
            assert x % mod == a

For general CRT, also test no-solution cases:

def test_crt_general_no_solution():
    assert crt_general([1, 4], [6, 10]) is None

Test Sieve Output

A prime sieve should agree with brute force on small ranges.

def test_sieve_against_bruteforce():
    primes = set(sieve(1000))

    for n in range(1, 1001):
        assert (n in primes) == is_prime_bruteforce(n)

For SPF tables:

def test_spf_table():
    spf = smallest_prime_factor(1000)

    for n in range(2, 1001):
        p = spf[n]

        assert p >= 2
        assert n % p == 0
        assert is_prime_bruteforce(p)

This verifies both primality and divisibility of the stored smallest factor.

Test Modular Exponentiation Against Built-In pow

In Python, pow(base, exponent, modulus) is a reliable oracle.

def test_mod_pow_against_builtin():
    for base in range(-50, 51):
        for exponent in range(0, 100):
            for modulus in range(1, 50):
                assert mod_pow(base, exponent, modulus) == pow(base, exponent, modulus)

This catches negative-base handling, exponent-zero behavior, and modulus-one behavior.

Test Combinatorics with Identities

Binomial coefficients satisfy strong identities.

Symmetry:

C(n,k) = C(n,n-k)

Pascal identity:

C(n,k) = C(n-1,k-1) + C(n-1,k)

Implementation:

def test_comb_identities():
    mod = 1_000_000_007
    fact, inv_fact = factorial_and_inverse_tables(200, mod)

    for n in range(0, 200):
        for k in range(0, n + 1):
            assert comb(n, k, fact, inv_fact, mod) == comb(n, n - k, fact, inv_fact, mod)

    for n in range(2, 200):
        for k in range(1, n):
            left = comb(n, k, fact, inv_fact, mod)
            right = (
                comb(n - 1, k - 1, fact, inv_fact, mod)
                + comb(n - 1, k, fact, inv_fact, mod)
            ) % mod

            assert left == right

These tests catch inverse-factorial errors and off-by-one mistakes.

Include Adversarial Values

Some numbers are designed to fool weak algorithms.

For primality testing, include Carmichael numbers:

def test_carmichael_numbers():
    for n in [561, 1105, 1729, 2465, 2821, 6601]:
        assert not is_prime_64(n)

For factorization, include:

prime powers
semiprimes
large primes
even numbers
perfect squares

Example:

def test_factorization_adversarial():
    values = [
        2**20,
        3**12,
        101 * 103,
        1000003 * 1000033,
        99991,
    ]

    for n in values:
        factors = factorize(n)
        assert reconstruct(factors) == n
        assert all(is_prime_64(p) for p in factors)

Adversarial tests prevent false confidence.

Randomized Tests

Randomized tests are useful when paired with invariants.

import random

def test_random_factorization():
    for _ in range(1000):
        n = random.randint(2, 10**9)

        factors = factorize(n)

        assert reconstruct(factors) == n
        assert all(is_prime_64(p) for p in factors)

For modular inverses:

def test_random_inverses():
    for _ in range(1000):
        m = random.randint(2, 10**6)
        a = random.randint(1, m - 1)

        if gcd(a, m) == 1:
            inv = mod_inverse(a, m)
            assert (a * inv) % m == 1

Randomized tests expand coverage, but they should not replace known edge cases.

Reproducibility

Randomized algorithms need reproducible tests.

Set a seed:

random.seed(0)

For algorithms such as Pollard's Rho, avoid tests that require a specific factor. Accept any nontrivial factor.

def test_pollards_rho_factor():
    n = 8051
    d = pollards_rho_retry(n)

    assert 1 < d < n
    assert n % d == 0

This makes the test robust despite randomness.

Separate Mathematical Tests from Performance Tests

Correctness tests should be deterministic, small, and frequent.

Performance tests should be separate.

Correctness:

Run on every commit

Performance:

Run on demand or in benchmark suite

Example benchmark helper:

from time import perf_counter

def benchmark(fn, *args):
    start = perf_counter()
    result = fn(*args)
    elapsed = perf_counter() - start
    return result, elapsed

Do not put strict timing assertions in ordinary unit tests unless the environment is controlled.

Common Mistakes

Testing Only Happy Paths

Include no-solution cases, non-coprime cases, and boundary values.

Expecting a Specific Pollard's Rho Factor

Either factor is valid.

Ignoring Negative Values

Extended GCD, linear congruences, and modular normalization often fail on negatives.

Testing Against the Same Algorithm

A test oracle should be independent when possible.

Skipping Reconstruction

For factorization, reconstruction is the simplest and strongest check.

Minimal Test Suite Checklist

A practical test suite should include:

known values
boundary cases
brute-force comparison for small inputs
algebraic identities
randomized invariant checks
adversarial numbers
reconstruction checks
no-solution cases

For number theory, these categories matter more than the raw number of tests.

Takeaway

Testing number-theoretic code requires exploiting the structure of the mathematics. Known values catch obvious mistakes, brute-force oracles validate small inputs, algebraic identities check behavior, reconstruction verifies factorization, and adversarial examples expose weak primality tests or edge-case failures. Randomized tests add breadth, but reproducibility and invariant-based assertions are essential. A reliable implementation is not just one that works on sample inputs. It is one that survives boundary values, invalid cases, and mathematically hostile examples.