Skip to content

Chapter 97. Randomized Linear Algebra

Randomized linear algebra uses random sampling and random projection to approximate matrix computations. Its goal is to reduce the cost of standard linear algebra tasks while preserving enough accuracy for the problem at hand.

The main idea is to replace a large matrix problem by a smaller random sketch. The sketch should keep the important geometry of the original matrix: lengths, angles, column space, row space, singular values, or residual norms. Once the sketch is formed, one solves a smaller problem and lifts the result back to the original scale.

Randomized methods are widely used for low-rank approximation, least squares, matrix multiplication, preconditioning, principal component analysis, and large-scale data analysis. Surveys of randomized numerical linear algebra emphasize sketching as a way to reduce computational cost for matrix multiplication, linear systems, least squares problems, and low-rank compression.

97.1 Motivation

Classical dense matrix algorithms can be too expensive for very large data.

A dense matrix

ARm×n A\in \mathbb{R}^{m\times n}

may have millions of rows and thousands of columns. Computing a full singular value decomposition or QR factorization may require too much time, memory, or communication.

Randomized linear algebra addresses this by asking for an approximation rather than an exact factorization.

For example, instead of computing the full SVD

A=UΣVT, A=U\Sigma V^T,

we may seek a rank-kk approximation

AUkΣkVkT A\approx U_k\Sigma_kV_k^T

where

kmin(m,n). k\ll \min(m,n).

If the singular values decay quickly, such an approximation may preserve most of the useful information in the matrix.

97.2 Deterministic Versus Randomized Algorithms

A deterministic algorithm follows a fixed sequence of operations.

A randomized algorithm uses random choices during computation.

TypeDescription
Deterministicsame input gives same path
Randomizeduses random samples or random projections
Monte Carlousually fast, small probability of failure
Las Vegasalways correct, random running time

Most randomized linear algebra algorithms are Monte Carlo algorithms. They return an approximate answer with high probability.

This probability is not a weakness by itself. It is part of the specification. The algorithm states both an error tolerance and a probability of success.

97.3 Sketching

A sketch is a compressed representation of a matrix or vector.

Given

ARm×n, A\in\mathbb{R}^{m\times n},

we choose a sketching matrix

SRs×m S\in\mathbb{R}^{s\times m}

with

sm. s\ll m.

Then we form

SARs×n. SA\in\mathbb{R}^{s\times n}.

The matrix SASA has fewer rows than AA. It is smaller and cheaper to process.

For least squares, one may also sketch the right-hand side:

Sb. Sb.

The sketched problem is

minxSAxSb2. \min_x \|SAx-Sb\|_2.

This problem is cheaper than the original problem

minxAxb2. \min_x \|Ax-b\|_2.

The purpose of the sketch is to preserve enough geometry so that the sketched solution is useful for the original problem. RandNLA descriptions often present least squares algorithms as first computing sketches SASA and SbSb, then solving the smaller problem or using it to build a preconditioner.

97.4 Random Sampling

Random sampling chooses rows, columns, or entries from a matrix.

For example, to approximate the column space of AA, one may sample columns of AA.

If column jj is sampled, it is usually rescaled by a factor depending on its sampling probability.

Sampling is simple and preserves interpretability because the selected columns or rows are actual columns or rows from the original matrix.

However, uniform sampling may fail when important information is concentrated in a few rows or columns.

97.5 Leverage Scores

Leverage scores measure how important rows or columns are to a subspace.

For a matrix

ARm×n A\in\mathbb{R}^{m\times n}

with orthonormal basis UU for its column space, the row leverage score of row ii is

i=Ui,:22. \ell_i=\|U_{i,:}\|_2^2.

Rows with large leverage scores have strong influence on the column space.

Sampling proportional to leverage scores often gives better sketches than uniform sampling.

The difficulty is that exact leverage scores may be expensive to compute. Randomized algorithms often approximate them.

97.6 Random Projection

Random projection mixes the data using a random matrix.

Instead of selecting rows, one forms random linear combinations of rows.

A common choice is a Gaussian sketch:

SijN(0,1/s). S_{ij}\sim N(0,1/s).

Then

SA SA

compresses AA to ss rows.

Random projections tend to spread information evenly. This makes them less sensitive to high-leverage rows than uniform sampling.

The cost is that the sketched rows are no longer interpretable as original rows.

97.7 Johnson-Lindenstrauss Principle

The Johnson-Lindenstrauss principle says that a set of points in high-dimensional Euclidean space can be embedded into a much lower-dimensional space while approximately preserving pairwise distances.

In linear algebra, this motivates random projections.

If the sketching matrix SS approximately preserves lengths in a subspace, then for vectors xx in that subspace,

Sx2x2. \|Sx\|_2\approx \|x\|_2.

This property is called a subspace embedding.

Subspace embeddings are the geometric foundation of many randomized algorithms.

97.8 Subspace Embedding

A matrix SS is an ε\varepsilon-subspace embedding for a subspace U\mathcal{U} if

(1ε)x22Sx22(1+ε)x22 (1-\varepsilon)\|x\|_2^2 \le \|Sx\|_2^2 \le (1+\varepsilon)\|x\|_2^2

for every

xU. x\in\mathcal{U}.

For a matrix AA, the relevant subspace is often the column space of AA.

If SS is a subspace embedding for col(A)\operatorname{col}(A), then the sketch preserves the geometry of all vectors of the form

Ax. Ax.

This is exactly what is needed for least squares and many approximation problems.

97.9 Randomized Least Squares

Consider the least squares problem

minxAxb2, \min_x \|Ax-b\|_2,

where

ARm×n,mn. A\in\mathbb{R}^{m\times n}, \qquad m\gg n.

A randomized sketching method chooses

SRs×m S\in\mathbb{R}^{s\times m}

with

nsm. n\le s\ll m.

Then it solves

minxSAxSb2. \min_x \|SAx-Sb\|_2.

If SS preserves the geometry of the column space of [A b][A\ b], then the solution of the sketched problem gives a good approximation to the original least squares solution.

Randomized least squares is useful when mm is very large and reading or factoring all of AA repeatedly is expensive.

97.10 Sketch-and-Solve

The simplest randomized least squares method is sketch-and-solve.

  1. Draw a sketching matrix SS.
  2. Form SASA and SbSb.
  3. Solve the smaller least squares problem
minxSAxSb2. \min_x \|SAx-Sb\|_2.
  1. Return the solution as an approximate solution to the original problem.

This method is simple and fast.

Its accuracy depends on the sketch size ss, the sketch type, the conditioning of AA, and the required tolerance.

Sketch-and-solve is often used when moderate accuracy is enough.

97.11 Sketching for Preconditioning

For high accuracy, a sketch may be used to build a preconditioner.

Instead of trusting the sketched solution directly, we use the sketch to construct a better-conditioned system. Then an iterative method solves the original problem.

For example, compute a QR factorization

SA=QR. SA=QR.

The matrix RR can be used as a right preconditioner:

AR1y=b,x=R1y. AR^{-1}y=b, \qquad x=R^{-1}y.

If the sketch captures the column space well, then

AR1 AR^{-1}

is better conditioned than AA.

This approach combines randomized preprocessing with deterministic iterative refinement.

97.12 Randomized Low-Rank Approximation

Low-rank approximation is one of the main uses of randomized linear algebra.

Given

ARm×n, A\in\mathbb{R}^{m\times n},

we want

AQB, A\approx QB,

where

QRm×k Q\in\mathbb{R}^{m\times k}

has orthonormal columns and

BRk×n. B\in\mathbb{R}^{k\times n}.

The columns of QQ should approximate the dominant column space of AA.

A randomized method finds this space by multiplying AA by a random test matrix.

97.13 Randomized Range Finder

Choose a random matrix

ΩRn×, \Omega\in\mathbb{R}^{n\times \ell},

where

=k+p. \ell=k+p.

Here kk is the target rank and pp is an oversampling parameter.

Compute

Y=AΩ. Y=A\Omega.

The columns of YY are random linear combinations of the columns of AA.

Then compute an orthonormal basis

Q=orth(Y). Q=\operatorname{orth}(Y).

The matrix

QQTA QQ^TA

is the projection of AA onto the sampled subspace.

If the range of QQ captures the dominant column space of AA, then

AQQTA. A\approx QQ^TA.

The randomized range finder is the core primitive behind many randomized SVD algorithms. Martinsson and Tropp describe randomized algorithms for matrix factorizations and linear systems, with emphasis on practical algorithms and theoretical foundations.

97.14 Randomized SVD

To compute an approximate rank-kk SVD:

  1. Draw
ΩRn×(k+p). \Omega\in\mathbb{R}^{n\times (k+p)}.
  1. Form
Y=AΩ. Y=A\Omega.
  1. Compute
Q=orth(Y). Q=\operatorname{orth}(Y).
  1. Form the smaller matrix
B=QTA. B=Q^TA.
  1. Compute the SVD of BB:
B=U~ΣVT. B=\widetilde{U}\Sigma V^T.
  1. Set
U=QU~. U=Q\widetilde{U}.

Then

AUΣVT. A\approx U\Sigma V^T.

The expensive SVD is performed on the smaller matrix BB, not on the original matrix AA.

This is the standard randomized SVD pattern.

97.15 Oversampling

Oversampling improves reliability.

Instead of choosing exactly kk random vectors, choose

=k+p \ell=k+p

where pp is a small integer such as 55, 1010, or 2020.

The extra vectors provide a safety margin. They reduce the probability that the random sketch misses an important direction.

Oversampling is cheap compared with the cost of processing the full matrix.

97.16 Power Iterations for Accuracy

Randomized SVD may lose accuracy when the singular values decay slowly.

Power iteration improves separation.

Instead of forming

Y=AΩ, Y=A\Omega,

form

Y=(AAT)qAΩ. Y=(AA^T)^qA\Omega.

Here qq is a small nonnegative integer.

This multiplies singular values by

σi2q+1. \sigma_i^{2q+1}.

Large singular values become more dominant relative to small ones.

Thus the sampled range better captures the leading singular directions.

The cost is additional passes over the matrix.

97.17 Pass Efficiency

For very large matrices, the number of passes over the data may dominate arithmetic cost.

A pass means reading the full matrix once.

Randomized algorithms are attractive because they can often be organized to use one or a few passes.

AlgorithmTypical passes
basic randomized range finder1
randomized SVD without power iteration1 to 2
randomized SVD with qq power iterations2q+12q+1 or more
streaming sketch1

Pass efficiency matters when the matrix is stored on disk, distributed across machines, or streamed from data sources.

97.18 Randomized Matrix Multiplication

Randomized methods can approximate matrix products.

Given

C=AB, C=AB,

where

ARm×n,BRn×p, A\in\mathbb{R}^{m\times n}, \qquad B\in\mathbb{R}^{n\times p},

one can sample columns of AA and corresponding rows of BB.

The approximation has the form

ABt=1s1spitA:,itBit,:. AB\approx \sum_{t=1}^s \frac{1}{s p_{i_t}} A_{:,i_t} B_{i_t,:}.

Here iti_t are sampled indices and pip_i are sampling probabilities.

Sampling probabilities based on column and row norms can reduce variance.

97.19 Frequent Directions

Frequent Directions is a deterministic sketching method, but it is often discussed alongside randomized sketching.

It maintains a small matrix

B B

that approximates the covariance

ATA. A^TA.

The goal is to ensure that

ATABTB \|A^TA-B^TB\|

is small.

This is useful for streaming data, where rows arrive one at a time and cannot all be stored.

Although not randomized in its basic form, it shares the same sketching viewpoint.

97.20 Randomized Preconditioners

Randomized preconditioners use sketches to improve conditioning.

For example, in least squares, a sketch can estimate the geometry of the column space and produce a preconditioner R1R^{-1}.

Then an iterative solver such as LSQR, conjugate gradient on normal equations, or another Krylov method solves the original problem more rapidly.

The sketch does not need to solve the problem exactly. It only needs to produce a useful transformation.

This is often more accurate than sketch-and-solve.

97.21 Error Guarantees

Randomized algorithms usually provide probabilistic error bounds.

A typical low-rank approximation guarantee has the form

AQQTA(1+ε)AAk \|A-QQ^TA\| \le (1+\varepsilon)\|A-A_k\|

with high probability.

Here AkA_k is the best rank-kk approximation in the chosen norm.

The phrase “with high probability” means that the bound may fail, but the failure probability is controlled by algorithm parameters such as sketch size and oversampling.

97.22 Best Rank-kk Approximation

The best rank-kk approximation to AA in spectral norm or Frobenius norm is given by the truncated SVD.

If

A=UΣVT, A=U\Sigma V^T,

then

Ak=UkΣkVkT. A_k=U_k\Sigma_kV_k^T.

The Eckart-Young theorem states that AkA_k minimizes

AB \|A-B\|

among all matrices BB with rank at most kk, for the spectral and Frobenius norms.

Randomized SVD seeks an approximation close to this optimal deterministic benchmark.

97.23 Dense Versus Sparse Random Sketches

Different sketching matrices have different costs.

SketchDescriptionMain feature
Gaussiandense normal random entriessimple theory
Rademacherentries ±1\pm 1simple and cheaper
SRHTstructured Hadamard transformfast multiplication
CountSketchone nonzero per columnvery sparse
Samplingselected rows or columnsinterpretable

Gaussian sketches are easy to analyze but may be expensive to multiply.

Sparse and structured sketches reduce computation.

The right choice depends on matrix size, sparsity, hardware, and desired accuracy.

97.24 CountSketch

CountSketch is a sparse random projection.

Each input coordinate is assigned randomly to one output bucket with a random sign.

Thus each column of the sketching matrix has one nonzero entry.

This makes multiplication

SA SA

very fast for sparse matrices.

CountSketch is useful in streaming and large-scale least squares.

Its sparsity reduces arithmetic and memory movement.

97.25 Subsampled Randomized Hadamard Transform

The subsampled randomized Hadamard transform, or SRHT, has the form

S=RHD. S=RHD.

Here:

SymbolMeaning
DDrandom diagonal sign matrix
HHHadamard transform
RRrow sampling matrix

The Hadamard transform can be applied in

O(nlogn) O(n\log n)

time.

The random signs spread information, and the sampling step reduces dimension.

SRHT is useful when fast structured multiplication is available.

97.26 Communication Cost

On modern hardware, communication often costs more than arithmetic.

Randomized algorithms can reduce communication by compressing data early.

For example, forming

Y=AΩ Y=A\Omega

may reduce a large matrix to a much smaller sample matrix.

This is important in distributed systems, out-of-core computation, and GPU computation.

Randomized methods are often valuable not only because they reduce floating point operations, but because they reduce data movement.

97.27 Streaming Algorithms

In streaming settings, rows of AA arrive one at a time.

The full matrix may never be stored.

A sketch is updated incrementally.

For example, if row aiTa_i^T arrives, the algorithm updates a small summary matrix.

At the end, the summary is used to approximate singular vectors, covariance structure, or least squares solutions.

Streaming randomized linear algebra is useful when data is too large to store or changes continuously.

97.28 Randomized Algorithms for PCA

Principal component analysis seeks leading singular vectors of a centered data matrix.

Randomized SVD provides a scalable PCA method.

Instead of computing the full covariance matrix

ATA, A^TA,

one sketches the data and computes an approximate low-rank SVD.

This is effective when only the leading components are needed.

For data analysis, the approximation is often sufficient because the original data contains noise.

97.29 Robustness and Failure Probability

Randomized methods can fail if the sketch misses important directions.

The probability of failure is reduced by:

TechniqueEffect
oversamplingcaptures more directions
power iterationimproves spectral separation
leverage-score samplingsamples influential rows or columns
repeated sketchesreduces variance
residual checkingvalidates the result

A practical algorithm should check the residual or approximation error when possible.

Randomization should not replace diagnostics.

97.30 Numerical Stability

Randomized algorithms still use floating point arithmetic.

They must account for rounding error, conditioning, and orthogonality.

For randomized SVD, the basis matrix QQ should be computed by a stable orthogonalization method.

For power iterations, reorthogonalization may be needed to prevent loss of numerical rank.

The random sketch reduces size, but the smaller deterministic problem must still be solved stably.

97.31 When Randomization Helps

Randomization is most useful when:

ConditionReason
matrix is very largeexact factorization is too expensive
only low-rank structure is neededfull information is unnecessary
data is noisyexact computation may be wasteful
matrix-vector products are cheapsketching is efficient
data is distributedsketch reduces communication
streaming access is requiredsketch can be maintained online

Randomization is less useful when the matrix is small, exact answers are required, or the spectrum has no exploitable decay.

97.32 When Randomization Is Risky

Randomization should be used carefully when:

SituationRisk
singular values decay slowlylow-rank approximation needs many dimensions
high accuracy is requiredsketch size may become large
matrix has high leverage rowsuniform sampling may fail
data is adversarialsimple random sampling may miss structure
condition number is largesketched problem may still be unstable
reproducibility is strictrandom seeds must be controlled

In these cases, one may need stronger sketches, larger oversampling, power iteration, or deterministic verification.

97.33 Practical Randomized SVD Checklist

A practical randomized SVD implementation should choose:

ParameterTypical choice
target rankdesired kk
oversamplingp=5p=5 to 2020
sample size=k+p\ell=k+p
power iterationsq=0,1,2q=0,1,2, depending on spectrum
sketch typeGaussian, SRHT, CountSketch, or sampling
orthogonalizationQR or stable equivalent
error checkresidual or held-out estimate

For many applications, a small oversampling parameter and one or two power iterations provide a good balance.

97.34 Summary

Randomized linear algebra uses random sketches to reduce large matrix problems to smaller ones.

The main tools are:

ToolPurpose
random samplingselect rows, columns, or entries
random projectioncompress by random linear combinations
subspace embeddingpreserve geometry of a subspace
leverage scoresidentify influential rows or columns
randomized range finderapproximate column space
randomized SVDcompute low-rank approximation
sketch-and-solveapproximate least squares
randomized preconditioningaccelerate iterative solvers

The central pattern is:

large problemrandom sketchsmall problemapproximate solution. \text{large problem} \quad\longrightarrow\quad \text{random sketch} \quad\longrightarrow\quad \text{small problem} \quad\longrightarrow\quad \text{approximate solution}.

Randomized methods trade exactness for speed, memory efficiency, and communication reduction. They are most effective when the matrix has low-rank structure, when moderate accuracy is sufficient, or when the cost of exact deterministic computation is too high.