15.7 Strassen Multiplication
Karatsuba's algorithm demonstrates that multiplication can be accelerated by reducing the number of recursive subproblems.
15.7 Strassen Multiplication
Problem
Karatsuba's algorithm demonstrates that multiplication can be accelerated by reducing the number of recursive subproblems. The same question naturally arises for matrix multiplication.
Given two (n \times n) matrices:
$$ A $$
and
$$ B $$
the classical algorithm computes:
$$ C = AB $$
using three nested loops.
For each element:
$$ C_{ij} $$
we compute:
$$ C_{ij}=\sum_{k=1}^{n}A_{ik}B_{kj} $$
The running time is:
$$ O(n^3) $$
For decades this was believed to be unavoidable.
Can divide and conquer reduce the exponent in matrix multiplication?
Solution
Use Strassen multiplication.
Introduced by entity["people","Volker Strassen","German mathematician"] in 1969, Strassen's algorithm was the first to prove that matrix multiplication could be performed asymptotically faster than (O(n^3)).
The key observation mirrors Karatsuba's insight. A straightforward divide-and-conquer approach requires eight recursive matrix multiplications. Strassen reorganizes the computation so that only seven recursive multiplications are necessary.
Reducing eight recursive calls to seven changes the recurrence dramatically.
Classical Divide and Conquer
Consider two matrices:
$$ A= \begin{bmatrix} A_{11} & A_{12}\\nA_{21} & A_{22} \end{bmatrix} $$
$$ B= \begin{bmatrix} B_{11} & B_{12}\\nB_{21} & B_{22} \end{bmatrix} $$
where each block is an ((n/2)\times(n/2)) matrix.
The product is:
$$ C= \begin{bmatrix} C_{11} & C_{12}\\nC_{21} & C_{22} \end{bmatrix} $$
with:
$$ C_{11}=A_{11}B_{11}+A_{12}B_{21} $$
$$ C_{12}=A_{11}B_{12}+A_{12}B_{22} $$
$$ C_{21}=A_{21}B_{11}+A_{22}B_{21} $$
$$ C_{22}=A_{21}B_{12}+A_{22}B_{22} $$
Computing these blocks requires:
A11B11
A12B21
A11B12
A12B22
A21B11
A22B21
A21B12
A22B22
Eight recursive matrix multiplications.
The recurrence becomes:
$$ T(n)=8T(n/2)+O(n^2) $$
Applying the Master Theorem:
$$ T(n)=O(n^3) $$
The divide-and-conquer formulation changes the structure but not the asymptotic complexity.
Strassen's Insight
The expensive operation is matrix multiplication.
Matrix addition and subtraction cost only:
$$ O(n^2) $$
If one multiplication can be eliminated and replaced by several additions, the overall algorithm may improve.
Strassen constructs seven intermediate products:
$$ M_1=(A_{11}+A_{22})(B_{11}+B_{22}) $$
$$ M_2=(A_{21}+A_{22})B_{11} $$
$$ M_3=A_{11}(B_{12}-B_{22}) $$
$$ M_4=A_{22}(B_{21}-B_{11}) $$
$$ M_5=(A_{11}+A_{12})B_{22} $$
$$ M_6=(A_{21}-A_{11})(B_{11}+B_{12}) $$
$$ M_7=(A_{12}-A_{22})(B_{21}+B_{22}) $$
Only seven recursive multiplications are performed.
Reconstructing the Result
The final matrix blocks are assembled as:
$$ C_{11}=M_1+M_4-M_5+M_7 $$
$$ C_{12}=M_3+M_5 $$
$$ C_{21}=M_2+M_4 $$
$$ C_{22}=M_1-M_2+M_3+M_6 $$
Although the formulas appear mysterious at first, expanding the algebra confirms that the resulting matrix product is identical to the classical formulation.
The achievement is remarkable because one multiplication disappears entirely.
A Small Example
Consider:
$$ A= \begin{bmatrix} 1 & 2\\n3 & 4 \end{bmatrix} $$
$$ B= \begin{bmatrix} 5 & 6\\n7 & 8 \end{bmatrix} $$
Here every block is a single number.
Compute:
M1 = (1+4)(5+8) = 65
M2 = (3+4)(5) = 35
M3 = (1)(6-8) = -2
M4 = (4)(7-5) = 8
M5 = (1+2)(8) = 24
M6 = (3-1)(5+6) = 22
M7 = (2-4)(7+8) = -30
Reconstruct:
C11 = 65 + 8 - 24 - 30 = 19
C12 = -2 + 24 = 22
C21 = 35 + 8 = 43
C22 = 65 - 35 - 2 + 22 = 50
Result:
$$ C= \begin{bmatrix} 19 & 22\\n43 & 50 \end{bmatrix} $$
which matches classical matrix multiplication.
Recurrence Analysis
The algorithm performs:
- Seven recursive multiplications of size (n/2)
- A collection of matrix additions and subtractions costing (O(n^2))
The recurrence is:
$$ T(n)=7T(n/2)+O(n^2) $$
The critical term becomes:
$$ n^{\log_2 7} $$
Using:
$$ \log_2 7 \approx 2.807 $$
we obtain:
$$ T(n)=O(n^{2.807}) $$
This was the first known algorithm faster than cubic matrix multiplication.
Why One Multiplication Matters
At the top level, the savings seem insignificant:
Classical: 8 multiplications
Strassen: 7 multiplications
The recursion tree amplifies the difference.
At depth one:
8² = 64
7² = 49
At depth two:
8³ = 512
7³ = 343
At depth ten:
8¹⁰ ≈ 1.07 billion
7¹⁰ ≈ 282 million
The reduction compounds at every level of recursion.
This is the same phenomenon that made Karatsuba successful.
Memory Considerations
Strassen trades arithmetic work for temporary storage.
Additional matrices must be created for:
A11 + A22
B11 + B22
A21 + A22
...
A naive implementation can allocate large numbers of temporary arrays.
Practical implementations often:
- Reuse buffers
- Employ memory pools
- Switch to classical multiplication for small blocks
Memory management often determines real-world performance more than asymptotic complexity.
Numerical Stability
Strassen introduces many extra additions and subtractions.
For integer arithmetic this is generally harmless.
For floating-point arithmetic:
(a + b) - (c + d)
can amplify rounding errors.
As recursion deepens, numerical inaccuracies may accumulate.
Consequently, scientific computing libraries frequently continue using highly optimized classical algorithms unless matrices are extremely large.
Numerical stability is sometimes more valuable than asymptotic speed.
Hybrid Algorithms
Modern implementations rarely use pure Strassen recursion.
A typical approach:
Large matrix
↓
Use Strassen
↓
Submatrix size < threshold
↓
Switch to classical multiplication
The threshold may range from a few dozen to several hundred rows, depending on hardware and implementation quality.
This hybrid strategy captures most of the asymptotic benefit while avoiding excessive overhead.
Practical Applications
Strassen's algorithm influenced many areas:
- Scientific computing
- Symbolic algebra systems
- Computational geometry
- Graph algorithms
- Complexity theory
Many advanced graph algorithms reduce their running times through fast matrix multiplication techniques derived from Strassen's work.
The algorithm also opened an entirely new field of research devoted to reducing the matrix multiplication exponent.
Beyond Strassen
Subsequent algorithms pushed the exponent even lower.
Examples include:
- entity["people","Don Coppersmith","computer scientist"] and entity["people","Shmuel Winograd","mathematician"]
- entity["people","Virginia Vassilevska Williams","computer scientist"] and collaborators
Modern theoretical results place the exponent below:
$$ 2.373 $$
However, these algorithms are extraordinarily complex and are rarely used in practical software.
Strassen remains the most important fast matrix multiplication algorithm from an engineering perspective.
Common Mistakes
Incorrect Block Formulas
A single sign error in one of the seven products can corrupt the entire result.
Most implementation bugs occur during reconstruction of (C_{11}), (C_{12}), (C_{21}), and (C_{22}).
Recursing Too Deeply
For small matrices, Strassen often performs worse than the classical algorithm.
Always introduce a cutoff threshold.
Excessive Allocation
Temporary matrix creation can dominate execution time.
Buffer reuse is essential for high-performance implementations.
Ignoring Matrix Padding
Matrices whose dimensions are not powers of two require padding or specialized handling.
Failure to manage dimensions correctly leads to incorrect indexing.
Discussion
Strassen multiplication represents one of the most important moments in algorithm design. Before its discovery, matrix multiplication was widely assumed to require cubic time. Strassen demonstrated that algebraic structure could be exploited to reduce the number of recursive multiplications, changing the exponent of the algorithm itself.
The broader lesson extends far beyond matrices. Many divide-and-conquer algorithms begin with a natural recursive decomposition that seems optimal. Occasionally, a deeper algebraic relationship allows several subproblems to be combined or eliminated entirely. When this happens, the recursion tree changes shape, and a seemingly minor reduction at one level propagates throughout the entire computation.
Karatsuba reduced four recursive multiplications to three. Strassen reduced eight to seven. Both algorithms show that the most powerful divide-and-conquer optimizations often come from questioning whether every recursive subproblem is truly necessary. The next section applies this mindset to a simpler but equally important operation: exponentiation, where divide and conquer transforms a linear process into a logarithmic one.