Chapter 97. Randomized Linear Algebra

Randomized linear algebra uses random sampling and random projection to approximate matrix computations. Its goal is to reduce the cost of standard linear algebra tasks while preserving enough accuracy for the problem at hand.

The main idea is to replace a large matrix problem by a smaller random sketch. The sketch should keep the important geometry of the original matrix: lengths, angles, column space, row space, singular values, or residual norms. Once the sketch is formed, one solves a smaller problem and lifts the result back to the original scale.

Randomized methods are widely used for low-rank approximation, least squares, matrix multiplication, preconditioning, principal component analysis, and large-scale data analysis. Surveys of randomized numerical linear algebra emphasize sketching as a way to reduce computational cost for matrix multiplication, linear systems, least squares problems, and low-rank compression.

97.1 Motivation

Classical dense matrix algorithms can be too expensive for very large data.

A dense matrix

A\in \mathbb{R}^{m\times n}

may have millions of rows and thousands of columns. Computing a full singular value decomposition or QR factorization may require too much time, memory, or communication.

Randomized linear algebra addresses this by asking for an approximation rather than an exact factorization.

For example, instead of computing the full SVD

A=U\Sigma V^T,

we may seek a rank- $k$ approximation

A\approx U_k\Sigma_kV_k^T

where

k\ll \min(m,n).

If the singular values decay quickly, such an approximation may preserve most of the useful information in the matrix.

97.2 Deterministic Versus Randomized Algorithms

A deterministic algorithm follows a fixed sequence of operations.

A randomized algorithm uses random choices during computation.

Type	Description
Deterministic	same input gives same path
Randomized	uses random samples or random projections
Monte Carlo	usually fast, small probability of failure
Las Vegas	always correct, random running time

Most randomized linear algebra algorithms are Monte Carlo algorithms. They return an approximate answer with high probability.

This probability is not a weakness by itself. It is part of the specification. The algorithm states both an error tolerance and a probability of success.

97.3 Sketching

A sketch is a compressed representation of a matrix or vector.

Given

A\in\mathbb{R}^{m\times n},

we choose a sketching matrix

S\in\mathbb{R}^{s\times m}

with

s\ll m.

Then we form

SA\in\mathbb{R}^{s\times n}.

The matrix $SA$ has fewer rows than $A$ . It is smaller and cheaper to process.

For least squares, one may also sketch the right-hand side:

Sb.

The sketched problem is

\min_x \|SAx-Sb\|_2.

This problem is cheaper than the original problem

\min_x \|Ax-b\|_2.

The purpose of the sketch is to preserve enough geometry so that the sketched solution is useful for the original problem. RandNLA descriptions often present least squares algorithms as first computing sketches $SA$ and $Sb$ , then solving the smaller problem or using it to build a preconditioner.

97.4 Random Sampling

Random sampling chooses rows, columns, or entries from a matrix.

For example, to approximate the column space of $A$ , one may sample columns of $A$ .

If column $j$ is sampled, it is usually rescaled by a factor depending on its sampling probability.

Sampling is simple and preserves interpretability because the selected columns or rows are actual columns or rows from the original matrix.

However, uniform sampling may fail when important information is concentrated in a few rows or columns.

97.5 Leverage Scores

Leverage scores measure how important rows or columns are to a subspace.

For a matrix

A\in\mathbb{R}^{m\times n}

with orthonormal basis $U$ for its column space, the row leverage score of row $i$ is

\ell_i=\|U_{i,:}\|_2^2.

Rows with large leverage scores have strong influence on the column space.

Sampling proportional to leverage scores often gives better sketches than uniform sampling.

The difficulty is that exact leverage scores may be expensive to compute. Randomized algorithms often approximate them.

97.6 Random Projection

Random projection mixes the data using a random matrix.

Instead of selecting rows, one forms random linear combinations of rows.

A common choice is a Gaussian sketch:

S_{ij}\sim N(0,1/s).

Then

SA

compresses $A$ to $s$ rows.

Random projections tend to spread information evenly. This makes them less sensitive to high-leverage rows than uniform sampling.

The cost is that the sketched rows are no longer interpretable as original rows.

97.7 Johnson-Lindenstrauss Principle

The Johnson-Lindenstrauss principle says that a set of points in high-dimensional Euclidean space can be embedded into a much lower-dimensional space while approximately preserving pairwise distances.

In linear algebra, this motivates random projections.

If the sketching matrix $S$ approximately preserves lengths in a subspace, then for vectors $x$ in that subspace,

\|Sx\|_2\approx \|x\|_2.

This property is called a subspace embedding.

Subspace embeddings are the geometric foundation of many randomized algorithms.

97.8 Subspace Embedding

A matrix $S$ is an $\varepsilon$ -subspace embedding for a subspace $\mathcal{U}$ if

(1-\varepsilon)\|x\|_2^2 \le \|Sx\|_2^2 \le (1+\varepsilon)\|x\|_2^2

for every

x\in\mathcal{U}.

For a matrix $A$ , the relevant subspace is often the column space of $A$ .

If $S$ is a subspace embedding for $\operatorname{col}(A)$ , then the sketch preserves the geometry of all vectors of the form

Ax.

This is exactly what is needed for least squares and many approximation problems.

97.9 Randomized Least Squares

Consider the least squares problem

\min_x \|Ax-b\|_2,

where

A\in\mathbb{R}^{m\times n}, \qquad m\gg n.

A randomized sketching method chooses

S\in\mathbb{R}^{s\times m}

with

n\le s\ll m.

Then it solves

\min_x \|SAx-Sb\|_2.

If $S$ preserves the geometry of the column space of $[A\ b]$ , then the solution of the sketched problem gives a good approximation to the original least squares solution.

Randomized least squares is useful when $m$ is very large and reading or factoring all of $A$ repeatedly is expensive.

97.10 Sketch-and-Solve

The simplest randomized least squares method is sketch-and-solve.

Draw a sketching matrix $S$ .
Form $SA$ and $Sb$ .
Solve the smaller least squares problem

\min_x \|SAx-Sb\|_2.

Return the solution as an approximate solution to the original problem.

This method is simple and fast.

Its accuracy depends on the sketch size $s$ , the sketch type, the conditioning of $A$ , and the required tolerance.

Sketch-and-solve is often used when moderate accuracy is enough.

97.11 Sketching for Preconditioning

For high accuracy, a sketch may be used to build a preconditioner.

Instead of trusting the sketched solution directly, we use the sketch to construct a better-conditioned system. Then an iterative method solves the original problem.

For example, compute a QR factorization

SA=QR.

The matrix $R$ can be used as a right preconditioner:

AR^{-1}y=b, \qquad x=R^{-1}y.

If the sketch captures the column space well, then

AR^{-1}

is better conditioned than $A$ .

This approach combines randomized preprocessing with deterministic iterative refinement.

97.12 Randomized Low-Rank Approximation

Low-rank approximation is one of the main uses of randomized linear algebra.

Given

A\in\mathbb{R}^{m\times n},

we want

A\approx QB,

where

Q\in\mathbb{R}^{m\times k}

has orthonormal columns and

B\in\mathbb{R}^{k\times n}.

The columns of $Q$ should approximate the dominant column space of $A$ .

A randomized method finds this space by multiplying $A$ by a random test matrix.

97.13 Randomized Range Finder

Choose a random matrix

\Omega\in\mathbb{R}^{n\times \ell},

where

\ell=k+p.

Here $k$ is the target rank and $p$ is an oversampling parameter.

Compute

Y=A\Omega.

The columns of $Y$ are random linear combinations of the columns of $A$ .

Then compute an orthonormal basis

Q=\operatorname{orth}(Y).

The matrix

QQ^TA

is the projection of $A$ onto the sampled subspace.

If the range of $Q$ captures the dominant column space of $A$ , then

A\approx QQ^TA.

The randomized range finder is the core primitive behind many randomized SVD algorithms. Martinsson and Tropp describe randomized algorithms for matrix factorizations and linear systems, with emphasis on practical algorithms and theoretical foundations.

97.14 Randomized SVD

To compute an approximate rank- $k$ SVD:

Draw

\Omega\in\mathbb{R}^{n\times (k+p)}.

Form

Y=A\Omega.

Compute

Q=\operatorname{orth}(Y).

Form the smaller matrix

B=Q^TA.

Compute the SVD of $B$ :

B=\widetilde{U}\Sigma V^T.

U=Q\widetilde{U}.

Then

A\approx U\Sigma V^T.

The expensive SVD is performed on the smaller matrix $B$ , not on the original matrix $A$ .

This is the standard randomized SVD pattern.

97.15 Oversampling

Oversampling improves reliability.

Instead of choosing exactly $k$ random vectors, choose

\ell=k+p

where $p$ is a small integer such as $5$ , $10$ , or $20$ .

The extra vectors provide a safety margin. They reduce the probability that the random sketch misses an important direction.

Oversampling is cheap compared with the cost of processing the full matrix.

97.16 Power Iterations for Accuracy

Randomized SVD may lose accuracy when the singular values decay slowly.

Power iteration improves separation.

Instead of forming

Y=A\Omega,

form

Y=(AA^T)^qA\Omega.

Here $q$ is a small nonnegative integer.

This multiplies singular values by

\sigma_i^{2q+1}.

Large singular values become more dominant relative to small ones.

Thus the sampled range better captures the leading singular directions.

The cost is additional passes over the matrix.

97.17 Pass Efficiency

For very large matrices, the number of passes over the data may dominate arithmetic cost.

A pass means reading the full matrix once.

Randomized algorithms are attractive because they can often be organized to use one or a few passes.

Algorithm	Typical passes
basic randomized range finder	1
randomized SVD without power iteration	1 to 2
randomized SVD with $q$ power iterations	$2q+1$ or more
streaming sketch	1

Pass efficiency matters when the matrix is stored on disk, distributed across machines, or streamed from data sources.

97.18 Randomized Matrix Multiplication

Randomized methods can approximate matrix products.

Given

C=AB,

where

A\in\mathbb{R}^{m\times n}, \qquad B\in\mathbb{R}^{n\times p},

one can sample columns of $A$ and corresponding rows of $B$ .

The approximation has the form

AB\approx \sum_{t=1}^s \frac{1}{s p_{i_t}} A_{:,i_t} B_{i_t,:}.

Here $i_t$ are sampled indices and $p_i$ are sampling probabilities.

Sampling probabilities based on column and row norms can reduce variance.

97.19 Frequent Directions

Frequent Directions is a deterministic sketching method, but it is often discussed alongside randomized sketching.

It maintains a small matrix

B

that approximates the covariance

A^TA.

The goal is to ensure that

\|A^TA-B^TB\|

is small.

This is useful for streaming data, where rows arrive one at a time and cannot all be stored.

Although not randomized in its basic form, it shares the same sketching viewpoint.

97.20 Randomized Preconditioners

Randomized preconditioners use sketches to improve conditioning.

For example, in least squares, a sketch can estimate the geometry of the column space and produce a preconditioner $R^{-1}$ .

Then an iterative solver such as LSQR, conjugate gradient on normal equations, or another Krylov method solves the original problem more rapidly.

The sketch does not need to solve the problem exactly. It only needs to produce a useful transformation.

This is often more accurate than sketch-and-solve.

97.21 Error Guarantees

Randomized algorithms usually provide probabilistic error bounds.

A typical low-rank approximation guarantee has the form

\|A-QQ^TA\| \le (1+\varepsilon)\|A-A_k\|

with high probability.

Here $A_k$ is the best rank- $k$ approximation in the chosen norm.

The phrase “with high probability” means that the bound may fail, but the failure probability is controlled by algorithm parameters such as sketch size and oversampling.

97.22 Best Rank- $k$ Approximation

The best rank- $k$ approximation to $A$ in spectral norm or Frobenius norm is given by the truncated SVD.

A=U\Sigma V^T,

then

A_k=U_k\Sigma_kV_k^T.

The Eckart-Young theorem states that $A_k$ minimizes

\|A-B\|

among all matrices $B$ with rank at most $k$ , for the spectral and Frobenius norms.

Randomized SVD seeks an approximation close to this optimal deterministic benchmark.

97.23 Dense Versus Sparse Random Sketches

Different sketching matrices have different costs.

Sketch	Description	Main feature
Gaussian	dense normal random entries	simple theory
Rademacher	entries $\pm 1$	simple and cheaper
SRHT	structured Hadamard transform	fast multiplication
CountSketch	one nonzero per column	very sparse
Sampling	selected rows or columns	interpretable

Gaussian sketches are easy to analyze but may be expensive to multiply.

Sparse and structured sketches reduce computation.

The right choice depends on matrix size, sparsity, hardware, and desired accuracy.

97.24 CountSketch

CountSketch is a sparse random projection.

Each input coordinate is assigned randomly to one output bucket with a random sign.

Thus each column of the sketching matrix has one nonzero entry.

This makes multiplication

SA

very fast for sparse matrices.

CountSketch is useful in streaming and large-scale least squares.

Its sparsity reduces arithmetic and memory movement.

97.25 Subsampled Randomized Hadamard Transform

The subsampled randomized Hadamard transform, or SRHT, has the form

S=RHD.

Here:

Symbol	Meaning
$D$	random diagonal sign matrix
$H$	Hadamard transform
$R$	row sampling matrix

The Hadamard transform can be applied in

O(n\log n)

time.

The random signs spread information, and the sampling step reduces dimension.

SRHT is useful when fast structured multiplication is available.

97.26 Communication Cost

On modern hardware, communication often costs more than arithmetic.

Randomized algorithms can reduce communication by compressing data early.

For example, forming

Y=A\Omega

may reduce a large matrix to a much smaller sample matrix.

This is important in distributed systems, out-of-core computation, and GPU computation.

Randomized methods are often valuable not only because they reduce floating point operations, but because they reduce data movement.

97.27 Streaming Algorithms

In streaming settings, rows of $A$ arrive one at a time.

The full matrix may never be stored.

A sketch is updated incrementally.

For example, if row $a_i^T$ arrives, the algorithm updates a small summary matrix.

At the end, the summary is used to approximate singular vectors, covariance structure, or least squares solutions.

Streaming randomized linear algebra is useful when data is too large to store or changes continuously.

97.28 Randomized Algorithms for PCA

Principal component analysis seeks leading singular vectors of a centered data matrix.

Randomized SVD provides a scalable PCA method.

Instead of computing the full covariance matrix

A^TA,

one sketches the data and computes an approximate low-rank SVD.

This is effective when only the leading components are needed.

For data analysis, the approximation is often sufficient because the original data contains noise.

97.29 Robustness and Failure Probability

Randomized methods can fail if the sketch misses important directions.

The probability of failure is reduced by:

Technique	Effect
oversampling	captures more directions
power iteration	improves spectral separation
leverage-score sampling	samples influential rows or columns
repeated sketches	reduces variance
residual checking	validates the result

A practical algorithm should check the residual or approximation error when possible.

Randomization should not replace diagnostics.

97.30 Numerical Stability

Randomized algorithms still use floating point arithmetic.

They must account for rounding error, conditioning, and orthogonality.

For randomized SVD, the basis matrix $Q$ should be computed by a stable orthogonalization method.

For power iterations, reorthogonalization may be needed to prevent loss of numerical rank.

The random sketch reduces size, but the smaller deterministic problem must still be solved stably.

97.31 When Randomization Helps

Randomization is most useful when:

Condition	Reason
matrix is very large	exact factorization is too expensive
only low-rank structure is needed	full information is unnecessary
data is noisy	exact computation may be wasteful
matrix-vector products are cheap	sketching is efficient
data is distributed	sketch reduces communication
streaming access is required	sketch can be maintained online

Randomization is less useful when the matrix is small, exact answers are required, or the spectrum has no exploitable decay.

97.32 When Randomization Is Risky

Randomization should be used carefully when:

Situation	Risk
singular values decay slowly	low-rank approximation needs many dimensions
high accuracy is required	sketch size may become large
matrix has high leverage rows	uniform sampling may fail
data is adversarial	simple random sampling may miss structure
condition number is large	sketched problem may still be unstable
reproducibility is strict	random seeds must be controlled

In these cases, one may need stronger sketches, larger oversampling, power iteration, or deterministic verification.

97.33 Practical Randomized SVD Checklist

A practical randomized SVD implementation should choose:

Parameter	Typical choice
target rank	desired $k$
oversampling	$p=5$ to $20$
sample size	$\ell=k+p$
power iterations	$q=0,1,2$ , depending on spectrum
sketch type	Gaussian, SRHT, CountSketch, or sampling
orthogonalization	QR or stable equivalent
error check	residual or held-out estimate

For many applications, a small oversampling parameter and one or two power iterations provide a good balance.

97.34 Summary

Randomized linear algebra uses random sketches to reduce large matrix problems to smaller ones.

The main tools are:

Tool	Purpose
random sampling	select rows, columns, or entries
random projection	compress by random linear combinations
subspace embedding	preserve geometry of a subspace
leverage scores	identify influential rows or columns
randomized range finder	approximate column space
randomized SVD	compute low-rank approximation
sketch-and-solve	approximate least squares
randomized preconditioning	accelerate iterative solvers

The central pattern is:

\text{large problem} \quad\longrightarrow\quad \text{random sketch} \quad\longrightarrow\quad \text{small problem} \quad\longrightarrow\quad \text{approximate solution}.

Randomized methods trade exactness for speed, memory efficiency, and communication reduction. They are most effective when the matrix has low-rank structure, when moderate accuracy is sufficient, or when the cost of exact deterministic computation is too high.

Chapter 97. Randomized Linear Algebra

97.1 Motivation

97.2 Deterministic Versus Randomized Algorithms

97.3 Sketching

97.4 Random Sampling

97.5 Leverage Scores

97.6 Random Projection

97.7 Johnson-Lindenstrauss Principle

97.8 Subspace Embedding

97.9 Randomized Least Squares

97.10 Sketch-and-Solve

97.11 Sketching for Preconditioning

97.12 Randomized Low-Rank Approximation

97.13 Randomized Range Finder

97.14 Randomized SVD

97.15 Oversampling

97.16 Power Iterations for Accuracy

97.17 Pass Efficiency

97.18 Randomized Matrix Multiplication

97.19 Frequent Directions

97.20 Randomized Preconditioners

97.21 Error Guarantees

97.22 Best Rank-kkk Approximation

97.23 Dense Versus Sparse Random Sketches

97.24 CountSketch

97.25 Subsampled Randomized Hadamard Transform

97.26 Communication Cost

97.27 Streaming Algorithms

97.28 Randomized Algorithms for PCA

97.29 Robustness and Failure Probability

97.30 Numerical Stability

97.31 When Randomization Helps

97.32 When Randomization Is Risky

97.33 Practical Randomized SVD Checklist

97.34 Summary

97.22 Best Rank- $k$ Approximation