Randomized linear algebra uses random sampling and random projection to approximate matrix computations. Its goal is to reduce the cost of standard linear algebra tasks while preserving enough accuracy for the problem at hand.
The main idea is to replace a large matrix problem by a smaller random sketch. The sketch should keep the important geometry of the original matrix: lengths, angles, column space, row space, singular values, or residual norms. Once the sketch is formed, one solves a smaller problem and lifts the result back to the original scale.
Randomized methods are widely used for low-rank approximation, least squares, matrix multiplication, preconditioning, principal component analysis, and large-scale data analysis. Surveys of randomized numerical linear algebra emphasize sketching as a way to reduce computational cost for matrix multiplication, linear systems, least squares problems, and low-rank compression.
97.1 Motivation
Classical dense matrix algorithms can be too expensive for very large data.
A dense matrix
may have millions of rows and thousands of columns. Computing a full singular value decomposition or QR factorization may require too much time, memory, or communication.
Randomized linear algebra addresses this by asking for an approximation rather than an exact factorization.
For example, instead of computing the full SVD
we may seek a rank- approximation
where
If the singular values decay quickly, such an approximation may preserve most of the useful information in the matrix.
97.2 Deterministic Versus Randomized Algorithms
A deterministic algorithm follows a fixed sequence of operations.
A randomized algorithm uses random choices during computation.
| Type | Description |
|---|---|
| Deterministic | same input gives same path |
| Randomized | uses random samples or random projections |
| Monte Carlo | usually fast, small probability of failure |
| Las Vegas | always correct, random running time |
Most randomized linear algebra algorithms are Monte Carlo algorithms. They return an approximate answer with high probability.
This probability is not a weakness by itself. It is part of the specification. The algorithm states both an error tolerance and a probability of success.
97.3 Sketching
A sketch is a compressed representation of a matrix or vector.
Given
we choose a sketching matrix
with
Then we form
The matrix has fewer rows than . It is smaller and cheaper to process.
For least squares, one may also sketch the right-hand side:
The sketched problem is
This problem is cheaper than the original problem
The purpose of the sketch is to preserve enough geometry so that the sketched solution is useful for the original problem. RandNLA descriptions often present least squares algorithms as first computing sketches and , then solving the smaller problem or using it to build a preconditioner.
97.4 Random Sampling
Random sampling chooses rows, columns, or entries from a matrix.
For example, to approximate the column space of , one may sample columns of .
If column is sampled, it is usually rescaled by a factor depending on its sampling probability.
Sampling is simple and preserves interpretability because the selected columns or rows are actual columns or rows from the original matrix.
However, uniform sampling may fail when important information is concentrated in a few rows or columns.
97.5 Leverage Scores
Leverage scores measure how important rows or columns are to a subspace.
For a matrix
with orthonormal basis for its column space, the row leverage score of row is
Rows with large leverage scores have strong influence on the column space.
Sampling proportional to leverage scores often gives better sketches than uniform sampling.
The difficulty is that exact leverage scores may be expensive to compute. Randomized algorithms often approximate them.
97.6 Random Projection
Random projection mixes the data using a random matrix.
Instead of selecting rows, one forms random linear combinations of rows.
A common choice is a Gaussian sketch:
Then
compresses to rows.
Random projections tend to spread information evenly. This makes them less sensitive to high-leverage rows than uniform sampling.
The cost is that the sketched rows are no longer interpretable as original rows.
97.7 Johnson-Lindenstrauss Principle
The Johnson-Lindenstrauss principle says that a set of points in high-dimensional Euclidean space can be embedded into a much lower-dimensional space while approximately preserving pairwise distances.
In linear algebra, this motivates random projections.
If the sketching matrix approximately preserves lengths in a subspace, then for vectors in that subspace,
This property is called a subspace embedding.
Subspace embeddings are the geometric foundation of many randomized algorithms.
97.8 Subspace Embedding
A matrix is an -subspace embedding for a subspace if
for every
For a matrix , the relevant subspace is often the column space of .
If is a subspace embedding for , then the sketch preserves the geometry of all vectors of the form
This is exactly what is needed for least squares and many approximation problems.
97.9 Randomized Least Squares
Consider the least squares problem
where
A randomized sketching method chooses
with
Then it solves
If preserves the geometry of the column space of , then the solution of the sketched problem gives a good approximation to the original least squares solution.
Randomized least squares is useful when is very large and reading or factoring all of repeatedly is expensive.
97.10 Sketch-and-Solve
The simplest randomized least squares method is sketch-and-solve.
- Draw a sketching matrix .
- Form and .
- Solve the smaller least squares problem
- Return the solution as an approximate solution to the original problem.
This method is simple and fast.
Its accuracy depends on the sketch size , the sketch type, the conditioning of , and the required tolerance.
Sketch-and-solve is often used when moderate accuracy is enough.
97.11 Sketching for Preconditioning
For high accuracy, a sketch may be used to build a preconditioner.
Instead of trusting the sketched solution directly, we use the sketch to construct a better-conditioned system. Then an iterative method solves the original problem.
For example, compute a QR factorization
The matrix can be used as a right preconditioner:
If the sketch captures the column space well, then
is better conditioned than .
This approach combines randomized preprocessing with deterministic iterative refinement.
97.12 Randomized Low-Rank Approximation
Low-rank approximation is one of the main uses of randomized linear algebra.
Given
we want
where
has orthonormal columns and
The columns of should approximate the dominant column space of .
A randomized method finds this space by multiplying by a random test matrix.
97.13 Randomized Range Finder
Choose a random matrix
where
Here is the target rank and is an oversampling parameter.
Compute
The columns of are random linear combinations of the columns of .
Then compute an orthonormal basis
The matrix
is the projection of onto the sampled subspace.
If the range of captures the dominant column space of , then
The randomized range finder is the core primitive behind many randomized SVD algorithms. Martinsson and Tropp describe randomized algorithms for matrix factorizations and linear systems, with emphasis on practical algorithms and theoretical foundations.
97.14 Randomized SVD
To compute an approximate rank- SVD:
- Draw
- Form
- Compute
- Form the smaller matrix
- Compute the SVD of :
- Set
Then
The expensive SVD is performed on the smaller matrix , not on the original matrix .
This is the standard randomized SVD pattern.
97.15 Oversampling
Oversampling improves reliability.
Instead of choosing exactly random vectors, choose
where is a small integer such as , , or .
The extra vectors provide a safety margin. They reduce the probability that the random sketch misses an important direction.
Oversampling is cheap compared with the cost of processing the full matrix.
97.16 Power Iterations for Accuracy
Randomized SVD may lose accuracy when the singular values decay slowly.
Power iteration improves separation.
Instead of forming
form
Here is a small nonnegative integer.
This multiplies singular values by
Large singular values become more dominant relative to small ones.
Thus the sampled range better captures the leading singular directions.
The cost is additional passes over the matrix.
97.17 Pass Efficiency
For very large matrices, the number of passes over the data may dominate arithmetic cost.
A pass means reading the full matrix once.
Randomized algorithms are attractive because they can often be organized to use one or a few passes.
| Algorithm | Typical passes |
|---|---|
| basic randomized range finder | 1 |
| randomized SVD without power iteration | 1 to 2 |
| randomized SVD with power iterations | or more |
| streaming sketch | 1 |
Pass efficiency matters when the matrix is stored on disk, distributed across machines, or streamed from data sources.
97.18 Randomized Matrix Multiplication
Randomized methods can approximate matrix products.
Given
where
one can sample columns of and corresponding rows of .
The approximation has the form
Here are sampled indices and are sampling probabilities.
Sampling probabilities based on column and row norms can reduce variance.
97.19 Frequent Directions
Frequent Directions is a deterministic sketching method, but it is often discussed alongside randomized sketching.
It maintains a small matrix
that approximates the covariance
The goal is to ensure that
is small.
This is useful for streaming data, where rows arrive one at a time and cannot all be stored.
Although not randomized in its basic form, it shares the same sketching viewpoint.
97.20 Randomized Preconditioners
Randomized preconditioners use sketches to improve conditioning.
For example, in least squares, a sketch can estimate the geometry of the column space and produce a preconditioner .
Then an iterative solver such as LSQR, conjugate gradient on normal equations, or another Krylov method solves the original problem more rapidly.
The sketch does not need to solve the problem exactly. It only needs to produce a useful transformation.
This is often more accurate than sketch-and-solve.
97.21 Error Guarantees
Randomized algorithms usually provide probabilistic error bounds.
A typical low-rank approximation guarantee has the form
with high probability.
Here is the best rank- approximation in the chosen norm.
The phrase “with high probability” means that the bound may fail, but the failure probability is controlled by algorithm parameters such as sketch size and oversampling.
97.22 Best Rank- Approximation
The best rank- approximation to in spectral norm or Frobenius norm is given by the truncated SVD.
If
then
The Eckart-Young theorem states that minimizes
among all matrices with rank at most , for the spectral and Frobenius norms.
Randomized SVD seeks an approximation close to this optimal deterministic benchmark.
97.23 Dense Versus Sparse Random Sketches
Different sketching matrices have different costs.
| Sketch | Description | Main feature |
|---|---|---|
| Gaussian | dense normal random entries | simple theory |
| Rademacher | entries | simple and cheaper |
| SRHT | structured Hadamard transform | fast multiplication |
| CountSketch | one nonzero per column | very sparse |
| Sampling | selected rows or columns | interpretable |
Gaussian sketches are easy to analyze but may be expensive to multiply.
Sparse and structured sketches reduce computation.
The right choice depends on matrix size, sparsity, hardware, and desired accuracy.
97.24 CountSketch
CountSketch is a sparse random projection.
Each input coordinate is assigned randomly to one output bucket with a random sign.
Thus each column of the sketching matrix has one nonzero entry.
This makes multiplication
very fast for sparse matrices.
CountSketch is useful in streaming and large-scale least squares.
Its sparsity reduces arithmetic and memory movement.
97.25 Subsampled Randomized Hadamard Transform
The subsampled randomized Hadamard transform, or SRHT, has the form
Here:
| Symbol | Meaning |
|---|---|
| random diagonal sign matrix | |
| Hadamard transform | |
| row sampling matrix |
The Hadamard transform can be applied in
time.
The random signs spread information, and the sampling step reduces dimension.
SRHT is useful when fast structured multiplication is available.
97.26 Communication Cost
On modern hardware, communication often costs more than arithmetic.
Randomized algorithms can reduce communication by compressing data early.
For example, forming
may reduce a large matrix to a much smaller sample matrix.
This is important in distributed systems, out-of-core computation, and GPU computation.
Randomized methods are often valuable not only because they reduce floating point operations, but because they reduce data movement.
97.27 Streaming Algorithms
In streaming settings, rows of arrive one at a time.
The full matrix may never be stored.
A sketch is updated incrementally.
For example, if row arrives, the algorithm updates a small summary matrix.
At the end, the summary is used to approximate singular vectors, covariance structure, or least squares solutions.
Streaming randomized linear algebra is useful when data is too large to store or changes continuously.
97.28 Randomized Algorithms for PCA
Principal component analysis seeks leading singular vectors of a centered data matrix.
Randomized SVD provides a scalable PCA method.
Instead of computing the full covariance matrix
one sketches the data and computes an approximate low-rank SVD.
This is effective when only the leading components are needed.
For data analysis, the approximation is often sufficient because the original data contains noise.
97.29 Robustness and Failure Probability
Randomized methods can fail if the sketch misses important directions.
The probability of failure is reduced by:
| Technique | Effect |
|---|---|
| oversampling | captures more directions |
| power iteration | improves spectral separation |
| leverage-score sampling | samples influential rows or columns |
| repeated sketches | reduces variance |
| residual checking | validates the result |
A practical algorithm should check the residual or approximation error when possible.
Randomization should not replace diagnostics.
97.30 Numerical Stability
Randomized algorithms still use floating point arithmetic.
They must account for rounding error, conditioning, and orthogonality.
For randomized SVD, the basis matrix should be computed by a stable orthogonalization method.
For power iterations, reorthogonalization may be needed to prevent loss of numerical rank.
The random sketch reduces size, but the smaller deterministic problem must still be solved stably.
97.31 When Randomization Helps
Randomization is most useful when:
| Condition | Reason |
|---|---|
| matrix is very large | exact factorization is too expensive |
| only low-rank structure is needed | full information is unnecessary |
| data is noisy | exact computation may be wasteful |
| matrix-vector products are cheap | sketching is efficient |
| data is distributed | sketch reduces communication |
| streaming access is required | sketch can be maintained online |
Randomization is less useful when the matrix is small, exact answers are required, or the spectrum has no exploitable decay.
97.32 When Randomization Is Risky
Randomization should be used carefully when:
| Situation | Risk |
|---|---|
| singular values decay slowly | low-rank approximation needs many dimensions |
| high accuracy is required | sketch size may become large |
| matrix has high leverage rows | uniform sampling may fail |
| data is adversarial | simple random sampling may miss structure |
| condition number is large | sketched problem may still be unstable |
| reproducibility is strict | random seeds must be controlled |
In these cases, one may need stronger sketches, larger oversampling, power iteration, or deterministic verification.
97.33 Practical Randomized SVD Checklist
A practical randomized SVD implementation should choose:
| Parameter | Typical choice |
|---|---|
| target rank | desired |
| oversampling | to |
| sample size | |
| power iterations | , depending on spectrum |
| sketch type | Gaussian, SRHT, CountSketch, or sampling |
| orthogonalization | QR or stable equivalent |
| error check | residual or held-out estimate |
For many applications, a small oversampling parameter and one or two power iterations provide a good balance.
97.34 Summary
Randomized linear algebra uses random sketches to reduce large matrix problems to smaller ones.
The main tools are:
| Tool | Purpose |
|---|---|
| random sampling | select rows, columns, or entries |
| random projection | compress by random linear combinations |
| subspace embedding | preserve geometry of a subspace |
| leverage scores | identify influential rows or columns |
| randomized range finder | approximate column space |
| randomized SVD | compute low-rank approximation |
| sketch-and-solve | approximate least squares |
| randomized preconditioning | accelerate iterative solvers |
The central pattern is:
Randomized methods trade exactness for speed, memory efficiency, and communication reduction. They are most effective when the matrix has low-rank structure, when moderate accuracy is sufficient, or when the cost of exact deterministic computation is too high.