15.16 Fast Fourier Transform

Problem

You need to multiply two polynomials.

Example:

$$ A(x)=3+2x+5x^2 $$

$$ B(x)=1+4x $$

The product is:

$$ C(x)=A(x)B(x) $$

A direct coefficient-by-coefficient multiplication compares every term of (A) with every term of (B).

For two polynomials with (n) coefficients, this costs:

$$ O(n^2) $$

That is acceptable for small inputs. It is too slow for large signals, big integer arithmetic, convolution, image processing, and many numerical algorithms.

Can divide and conquer multiply polynomials faster?

Solution

Use the Fast Fourier Transform, usually abbreviated FFT.

FFT is a divide-and-conquer algorithm for evaluating a polynomial at many carefully chosen points. It converts polynomial multiplication into three steps:

Evaluate both polynomials at selected points.
Multiply the values pointwise.
Interpolate the resulting polynomial.

The expensive part is evaluation and interpolation. FFT performs both in:

$$ O(n\log n) $$

time.

That reduces polynomial multiplication from:

$$ O(n^2) $$

to:

$$ O(n\log n) $$

up to constant factors and numerical concerns.

Coefficient Form vs Value Form

A polynomial can be represented in coefficient form:

A(x) = a0 + a1*x + a2*x^2 + ... + an*x^n

Example:

[3, 2, 5]

means:

3 + 2x + 5x²

A polynomial can also be represented by values at enough points.

For a degree (d) polynomial, (d+1) distinct points determine the polynomial uniquely.

So instead of storing coefficients, we can store:

A(x0), A(x1), A(x2), ...

This is called value representation.

Why Value Form Helps

Suppose we know:

A(xi)
B(xi)

at the same point (x_i).

Then:

C(xi) = A(xi) * B(xi)

where:

C(x) = A(x)B(x)

Pointwise multiplication is cheap.

If we evaluate both input polynomials at enough points, we can multiply their values pairwise and then reconstruct the product polynomial.

The full plan is:

coefficients of A
coefficients of B
        ↓
evaluate A and B
        ↓
multiply values
        ↓
interpolate C
        ↓
coefficients of C

The Naive Evaluation Problem

Evaluating a degree-(n) polynomial at one point can be done in:

$$ O(n) $$

using Horner's rule.

Evaluating it at (n) points costs:

$$ O(n^2) $$

That would erase the benefit.

FFT accelerates multipoint evaluation by choosing points with special symmetry: roots of unity.

Roots of Unity

The (n)-th roots of unity are complex numbers satisfying:

$$ \omega^n = 1 $$

They are evenly spaced around the unit circle.

For example, the fourth roots of unity are:

1
i
-1
-i

The symmetry of these points allows the evaluation problem to split into even and odd coefficient parts.

This is the divide-and-conquer step.

Splitting the Polynomial

Given:

$$ A(x)=a_0+a_1x+a_2x^2+a_3x^3+\cdots $$

split into even and odd powers:

$$ A_{\text{even}}(x)=a_0+a_2x+a_4x^2+\cdots $$

$$ A_{\text{odd}}(x)=a_1+a_3x+a_5x^2+\cdots $$

Then:

$$ A(x)=A_{\text{even}}(x^2)+xA_{\text{odd}}(x^2) $$

This identity is the core of FFT.

It lets us evaluate (A(x)) by recursively evaluating two half-size polynomials.

Divide-and-Conquer Evaluation

To evaluate (A) at all (n)-th roots of unity:

Split coefficients into even and odd parts.
Recursively evaluate both parts at the (n/2)-th roots of unity.
Combine the results using symmetry.

The recurrence is:

$$ T(n)=2T(n/2)+O(n) $$

Thus:

$$ T(n)=O(n\log n) $$

This is the same recurrence as merge sort, but applied to polynomial evaluation.

Combining Results

Let:

even[k] = value of A_even at root k
odd[k]  = value of A_odd at root k

Then the two corresponding values of (A) are:

A[k]       = even[k] + w * odd[k]
A[k+n/2]   = even[k] - w * odd[k]

where (w) is the appropriate root of unity.

This pairwise combination is often called a butterfly operation.

Recursive FFT Sketch

type Complex = complex128

func FFT(a []Complex) []Complex {
    n := len(a)

    if n == 1 {
        return []Complex{a[0]}
    }

    even := make([]Complex, n/2)
    odd := make([]Complex, n/2)

    for i := 0; i < n/2; i++ {
        even[i] = a[2*i]
        odd[i] = a[2*i+1]
    }

    yEven := FFT(even)
    yOdd := FFT(odd)

    y := make([]Complex, n)

    for k := 0; k < n/2; k++ {
        angle := -2 * math.Pi * float64(k) / float64(n)
        w := cmplx.Exp(complex(0, angle))

        t := w * yOdd[k]

        y[k] = yEven[k] + t
        y[k+n/2] = yEven[k] - t
    }

    return y
}

This is the clearest version, not the fastest one.

Production FFT implementations avoid repeated allocation, use iterative bit-reversal layouts, and exploit CPU vectorization.

Inverse FFT

FFT converts coefficients to values.

To convert values back to coefficients, use the inverse FFT.

The inverse transform is almost identical:

use the opposite sign in the exponent
divide every result by (n)

Sketch:

func IFFT(a []Complex) []Complex {
    n := len(a)

    for i := range a {
        a[i] = cmplx.Conj(a[i])
    }

    y := FFT(a)

    for i := range y {
        y[i] = cmplx.Conj(y[i]) / Complex(n)
    }

    return y
}

This uses the common conjugation trick.

Polynomial Multiplication

To multiply two coefficient arrays:

Pad both arrays to a power of two large enough to hold the result.
Compute FFT of both.
Multiply pointwise.
Compute inverse FFT.
Round coefficients if the expected output is integral.

func MultiplyPoly(a, b []int) []int {
    size := 1

    for size < len(a)+len(b)-1 {
        size *= 2
    }

    fa := make([]Complex, size)
    fb := make([]Complex, size)

    for i := range a {
        fa[i] = Complex(a[i])
    }

    for i := range b {
        fb[i] = Complex(b[i])
    }

    fa = FFT(fa)
    fb = FFT(fb)

    for i := 0; i < size; i++ {
        fa[i] *= fb[i]
    }

    fa = IFFT(fa)

    result := make([]int, len(a)+len(b)-1)

    for i := range result {
        result[i] = int(math.Round(real(fa[i])))
    }

    return result
}

Example:

fmt.Println(MultiplyPoly(
    []int{3, 2, 5},
    []int{1, 4},
))

Output:

[3 14 13 20]

which represents:

3 + 14x + 13x² + 20x³

Convolution

Polynomial multiplication and convolution are the same operation under different names.

Given:

a = [a0, a1, a2]
b = [b0, b1]

The convolution is:

c[k] = sum a[i] * b[k-i]

This is exactly the coefficient formula for polynomial multiplication.

FFT-based convolution is used in:

signal processing
audio filters
image processing
probability distributions
large integer multiplication
pattern matching

Padding

If:

len(a) = m
len(b) = n

then the result has:

m + n - 1

coefficients.

FFT usually works most simply when the input length is a power of two.

So choose:

size = nextPowerOfTwo(m + n - 1)

Failing to pad correctly causes circular convolution, where values wrap around and corrupt the result.

Numerical Error

The complex FFT uses floating-point arithmetic.

Small errors are expected.

Example:

12.9999999998

should be rounded to:

For integer polynomial multiplication, round the real part of each coefficient.

For very large coefficients, floating-point error can become too large.

In those cases, use:

coefficient splitting
multiple transforms
Number Theoretic Transform
arbitrary-precision arithmetic

Number Theoretic Transform

The Number Theoretic Transform, or NTT, is an FFT-like algorithm over modular integers instead of complex numbers.

It avoids floating-point error.

It is common in competitive programming and exact integer convolution.

The tradeoff is that the modulus must support suitable roots of unity.

Conceptually, the algorithm is the same:

evaluate
multiply pointwise
interpolate

but all arithmetic happens modulo a prime.

Complexity Analysis

FFT satisfies:

$$ T(n)=2T(n/2)+O(n) $$

So:

$$ T(n)=O(n\log n) $$

Polynomial multiplication performs:

one FFT for (A)
one FFT for (B)
one inverse FFT
one pointwise multiplication pass

Therefore total complexity remains:

$$ O(n\log n) $$

Space complexity is:

$$ O(n) $$

for the transform arrays.

Iterative FFT

Recursive FFT is easy to read.

Iterative FFT is usually faster.

It first reorders input by bit-reversed indexes, then performs butterfly operations with increasing block sizes:

length = 2
length = 4
length = 8
...

This avoids recursive allocation and improves locality.

The mathematical structure is identical to the recursive version.

Use the recursive version to learn the algorithm.

Use an optimized iterative version in performance-critical code.

Common Mistakes

Forgetting to Pad

Without enough zero-padding, convolution wraps around.

Always pad to at least:

len(a) + len(b) - 1

Using the Wrong Sign

Forward and inverse FFT differ by the sign of the exponent.

Using the same sign twice will not recover the coefficients.

Forgetting to Divide by n

The inverse transform requires normalization.

If you omit it, every coefficient is multiplied by (n).

Trusting Floating-Point Values Directly

Integer results should be rounded.

Do not cast directly if small error is possible.

Recursing on Non-Power-of-Two Lengths

The simple Cooley-Tukey implementation assumes even splitting down to length one.

Pad to a power of two.

Discussion

FFT is one of the deepest divide-and-conquer algorithms in everyday use. It begins with a simple algebraic identity: separate even and odd powers. By choosing evaluation points with the right symmetry, that identity becomes a recursive algorithm with the same recurrence as merge sort.

The payoff is enormous. Polynomial multiplication, convolution, and many signal-processing tasks move from quadratic time to near-linear time. The algorithm also shows a recurring theme in this chapter: the right representation can make the problem easier. Coefficient form makes multiplication expensive. Value form makes multiplication trivial. FFT is the bridge between the two.

For implementation work, the main concerns are padding, numerical precision, memory allocation, and choosing between complex FFT and modular NTT. For algorithm design, the main lesson is broader: when a problem has algebraic symmetry, divide and conquer can often expose it and turn a dense computation into a structured one.