Skip to content

Chapter 133. Random Matrices

133.1 Introduction

A random matrix is a matrix whose entries are random variables.

Random matrix theory studies how algebraic quantities such as eigenvalues, singular values, determinants, traces, ranks, and condition numbers behave when the matrix is chosen from a probability model. It is a meeting point of linear algebra, probability, statistics, numerical analysis, mathematical physics, and data science.

The subject is especially useful when the dimension is large. In small dimensions, one usually studies a matrix entry by entry. In large dimensions, individual entries matter less than collective behavior. The eigenvalues of a large random matrix often arrange themselves according to deterministic patterns. This is one of the main discoveries of random matrix theory. Random matrix theory studies eigenvalues, eigenvectors, and singular values of large matrices with random entries, and classical limit laws such as Wigner’s semicircle law and the Marchenko-Pastur law describe common limiting spectral distributions.

The basic question is:

What linear-algebraic structure appears when matrix entries are random? \text{What linear-algebraic structure appears when matrix entries are random?}

This question has many forms. How large is the largest eigenvalue? How stable is a random linear system? What is the rank of a random matrix? How does noise change singular values? When does a random matrix behave almost like an isometry?

133.2 Random Vectors

A random vector in Rn\mathbb{R}^n is a vector

X=[X1X2Xn] X = \begin{bmatrix} X_1\\ X_2\\ \vdots\\ X_n \end{bmatrix}

whose components are random variables.

The expectation of XX is defined componentwise:

EX=[EX1EX2EXn]. \mathbb{E}X = \begin{bmatrix} \mathbb{E}X_1\\ \mathbb{E}X_2\\ \vdots\\ \mathbb{E}X_n \end{bmatrix}.

If XX has mean zero, then

EX=0. \mathbb{E}X=0.

The covariance matrix of XX is

Σ=E[(Xμ)(Xμ)T], \Sigma = \mathbb{E}\left[(X-\mu)(X-\mu)^T\right],

where

μ=EX. \mu=\mathbb{E}X.

The covariance matrix is symmetric and positive semidefinite. It records second-order dependence among the coordinates of XX.

Random matrices are often built by placing random vectors as rows or columns.

133.3 Random Matrices

A random matrix is a matrix-valued random variable.

For example,

A=[A11A12A1nA21A22A2nAm1Am2Amn] A = \begin{bmatrix} A_{11} & A_{12} & \cdots & A_{1n}\\ A_{21} & A_{22} & \cdots & A_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ A_{m1} & A_{m2} & \cdots & A_{mn} \end{bmatrix}

is random if each entry AijA_{ij} is a random variable.

A common model assumes that the entries are independent and identically distributed. This is often abbreviated as i.i.d.

For example, one may assume

AijN(0,1), A_{ij}\sim N(0,1),

independently for all i,ji,j.

Another common normalization is

AijN(0,1n). A_{ij}\sim N\left(0,\frac{1}{n}\right).

The normalization by 1/n1/n keeps eigenvalues and singular values at a stable scale as nn grows.

133.4 Ensembles

A probability model for random matrices is called an ensemble.

Important ensembles include:

EnsembleMatrix typeTypical use
Gaussian Orthogonal EnsembleReal symmetricSpectral statistics
Gaussian Unitary EnsembleComplex HermitianQuantum models
Wishart ensembleSample covarianceStatistics
Bernoulli ensembleEntries ±1\pm1Discrete random matrices
Ginibre ensembleGeneral non-symmetricNon-Hermitian spectra
Sparse random matricesMany zero entriesGraphs and networks

The word ensemble emphasizes that one studies a whole distribution of matrices rather than one fixed matrix.

133.5 Symmetric Random Matrices

A real symmetric random matrix satisfies

AT=A. A^T=A.

Because it is symmetric, all eigenvalues are real. This makes spectral analysis more tractable.

A standard Wigner matrix is a symmetric random matrix whose upper-triangular entries are independent, usually with mean zero and suitable variance. The entries below the diagonal are then determined by symmetry:

Aij=Aji. A_{ij}=A_{ji}.

A common normalization is

Aij=Xijn, A_{ij} = \frac{X_{ij}}{\sqrt{n}},

where the XijX_{ij} have mean zero and variance one.

The factor

1n \frac{1}{\sqrt{n}}

keeps the eigenvalues in a bounded interval as nn\to\infty.

133.6 Empirical Spectral Distribution

Let

An A_n

be an n×nn\times n matrix with real eigenvalues

λ1,,λn. \lambda_1,\ldots,\lambda_n.

The empirical spectral distribution is the probability measure

μAn=1ni=1nδλi, \mu_{A_n} = \frac{1}{n}\sum_{i=1}^n \delta_{\lambda_i},

where δλi\delta_{\lambda_i} is the point mass at λi\lambda_i.

This measure places equal mass on each eigenvalue.

Instead of studying each eigenvalue separately, one studies the distribution of all eigenvalues at once.

For large random matrices, the empirical spectral distribution often converges to a deterministic limiting distribution.

This is the spectral analogue of the law of large numbers.

133.7 Wigner’s Semicircle Law

Wigner’s semicircle law is one of the central results of random matrix theory.

For a broad class of symmetric random matrices with independent entries, mean zero, and appropriate variance scaling, the empirical spectral distribution converges to the semicircle distribution.

The semicircle density is

ρ(x)=12π4x2,2x2. \rho(x) = \frac{1}{2\pi}\sqrt{4-x^2}, \qquad -2\le x\le2.

Outside this interval,

ρ(x)=0. \rho(x)=0.

Thus, for large nn, most eigenvalues lie near the interval

[2,2]. [-2,2].

The word “semicircle” comes from the graph of the density, which is the upper half of a circle after scaling.

This theorem shows that the global eigenvalue distribution can become deterministic even though the entries of the matrix are random. Wigner introduced this type of model in the 1950s while studying spectra of complex physical systems.

133.8 The Moment Method

One way to prove spectral limit laws is the moment method.

The kk-th moment of the empirical spectral distribution is

xkdμAn(x). \int x^k\,d\mu_{A_n}(x).

Since the eigenvalues of AnA_n are λ1,,λn\lambda_1,\ldots,\lambda_n, this moment equals

1ni=1nλik. \frac{1}{n}\sum_{i=1}^n \lambda_i^k.

Using the trace,

1ni=1nλik=1ntr(Ank). \frac{1}{n}\sum_{i=1}^n \lambda_i^k = \frac{1}{n}\operatorname{tr}(A_n^k).

Thus spectral moments can be computed from traces.

The trace expands into sums over products of entries:

tr(Ank)=i1,,ikAi1i2Ai2i3Aiki1. \operatorname{tr}(A_n^k) = \sum_{i_1,\ldots,i_k} A_{i_1i_2}A_{i_2i_3}\cdots A_{i_ki_1}.

The expectation of this expression can be analyzed by counting which index patterns contribute.

This method connects random matrix theory with combinatorics.

133.9 Sample Covariance Matrices

Suppose

X1,,XNRp X_1,\ldots,X_N\in\mathbb{R}^p

are data vectors with mean zero.

Arrange them as columns of a matrix

X=[X1X2XN]. X = \begin{bmatrix} | & | & & |\\ X_1 & X_2 & \cdots & X_N\\ | & | & & | \end{bmatrix}.

The sample covariance matrix is

S=1NXXT. S = \frac{1}{N}XX^T.

This matrix is symmetric and positive semidefinite.

Its eigenvalues describe the variance of the data in different directions.

When both pp and NN are large, the eigenvalues of SS are random and strongly affected by dimension. Random matrix theory gives baseline predictions for what covariance spectra look like under pure noise.

133.10 Marchenko-Pastur Law

The Marchenko-Pastur law describes the limiting eigenvalue distribution of large sample covariance matrices.

Let XX be a p×Np\times N random matrix with independent entries of mean zero and suitable variance. Consider

S=1NXXT. S=\frac{1}{N}XX^T.

If

p,N p,N\to\infty

with

pNγ, \frac{p}{N}\to\gamma,

then the eigenvalues of SS converge to a deterministic distribution supported on

[(1γ)2, (1+γ)2] \left[(1-\sqrt{\gamma})^2,\ (1+\sqrt{\gamma})^2\right]

in the standard variance-one scaling.

This distribution is the Marchenko-Pastur distribution. It describes the asymptotic behavior of eigenvalues of large sample covariance matrices and, equivalently, singular values of large rectangular random matrices.

The Marchenko-Pastur law is important in statistics because it explains why high-dimensional covariance matrices have spread-out eigenvalues even when the underlying population covariance is the identity.

In high dimensions, noise has structure.

133.11 Singular Values

For a rectangular matrix

ARm×n, A\in\mathbb{R}^{m\times n},

the singular values are the nonnegative square roots of the eigenvalues of

ATA. A^TA.

They are written as

σ1(A)σ2(A)0. \sigma_1(A)\ge\sigma_2(A)\ge\cdots\ge0.

Random matrix theory studies the distribution of these singular values.

The largest singular value controls the operator norm:

A2=σ1(A). \|A\|_2=\sigma_1(A).

The smallest singular value controls invertibility and conditioning when AA is square or nearly square.

A random matrix with independent entries often has singular values concentrated in a predictable interval after proper normalization.

133.12 Conditioning

The condition number of an invertible matrix AA is

κ(A)=σmax(A)σmin(A). \kappa(A) = \frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}.

A large condition number means that solving

Ax=b Ax=b

is sensitive to perturbations.

For random matrices, the smallest singular value is especially important. If it is close to zero, the matrix is nearly singular.

Random matrix theory gives probabilistic bounds for

σmin(A) \sigma_{\min}(A)

and

κ(A). \kappa(A).

These bounds explain why random linear systems are often better behaved than worst-case systems.

133.13 Concentration of Measure

Many random matrix results depend on concentration of measure.

A random quantity concentrates when it is very close to its mean or typical value with high probability.

For example, if xRnx\in\mathbb{R}^n is a random vector with independent standard normal entries, then

x2=x12++xn2 \|x\|^2 = x_1^2+\cdots+x_n^2

is usually close to

n. n.

Thus

x \|x\|

is usually close to

n. \sqrt{n}.

In high dimensions, randomness can become highly regular.

Random matrices inherit this behavior. Norms, singular values, traces, and quadratic forms often concentrate near deterministic values.

133.14 Subgaussian Random Variables

A subgaussian random variable is a random variable whose tails decay at least as fast as a Gaussian tail, up to constants.

Examples include:

DistributionSubgaussian?
Standard normalYes
Bernoulli ±1\pm1Yes
Uniform on [1,1][-1,1]Yes
CauchyNo

Subgaussian assumptions are common because they allow strong probability bounds while covering many useful distributions.

Random matrices with independent subgaussian entries often behave similarly to Gaussian random matrices at the level of norm and singular value estimates.

This phenomenon is part of universality.

133.15 Universality

Universality means that many large-scale spectral behaviors do not depend strongly on the exact distribution of the entries.

For example, Wigner’s semicircle law holds for many entry distributions, not just Gaussian ones. The detailed distribution of each entry becomes less important than broad features such as independence, mean, variance, and tail behavior.

This resembles the central limit theorem: many sums of independent random variables converge to the same Gaussian limit.

In random matrix theory, many eigenvalue distributions and local statistics show similar distribution-independent behavior.

133.16 Random Projections

Random matrices are used to project high-dimensional data into lower-dimensional spaces.

Let

ARm×n A\in\mathbb{R}^{m\times n}

be a random matrix with

m<n. m<n.

The map

xAx x\mapsto Ax

compresses vectors from Rn\mathbb{R}^n into Rm\mathbb{R}^m.

With proper normalization, random projections approximately preserve distances among a finite set of points when mm is large enough.

This principle is used in dimensionality reduction, compressed sensing, nearest-neighbor search, and randomized numerical linear algebra.

133.17 Randomized Numerical Linear Algebra

Random matrices can accelerate deterministic linear algebra algorithms.

A large matrix AA may be sketched by multiplying it with a random matrix Ω\Omega:

Y=AΩ. Y=A\Omega.

The matrix YY captures important information about the column space of AA.

This idea supports randomized algorithms for:

TaskRandomized method
Low-rank approximationRandomized SVD
Least squaresSketching
Matrix multiplicationRandom sampling
Trace estimationRandom probes
PreconditioningRandom embeddings

The goal is to reduce computational cost while preserving the important linear structure.

133.18 Spiked Models

A spiked model separates signal from noise.

A typical form is

A=S+N, A = S + N,

where SS is a low-rank signal matrix and NN is random noise.

The question is whether the signal can be detected from the eigenvalues or singular values of AA.

In covariance problems, one may study

Σ=I+θuuT, \Sigma = I + \theta uu^T,

where II is isotropic noise and θuuT\theta uu^T is a rank-one signal.

Random matrix theory describes when the top eigenvalue separates from the noise bulk and when the corresponding eigenvector aligns with the signal direction.

This is important in principal component analysis.

133.19 Random Graph Matrices

Random graphs produce random matrices.

For a graph GG, the adjacency matrix is

Aij={1,if vertices i and j are connected,0,otherwise. A_{ij} = \begin{cases} 1, & \text{if vertices } i \text{ and } j \text{ are connected},\\ 0, & \text{otherwise}. \end{cases}

If the graph is random, then the adjacency matrix is random.

Spectral properties of this matrix reveal graph structure.

Examples include:

MatrixEncodes
Adjacency matrixEdges
Laplacian matrixConnectivity
Normalized LaplacianRandom walks
Non-backtracking matrixCommunity structure

Random graph matrices connect random matrix theory with combinatorics and network science.

133.20 Noise, Signal, and Spectrum

One of the practical uses of random matrix theory is distinguishing signal from noise.

In data analysis, an observed matrix often has the form

A=signal+noise. A = \text{signal} + \text{noise}.

The spectrum of a pure-noise matrix gives a reference model.

Eigenvalues outside the expected noise range may indicate signal.

For sample covariance matrices, the Marchenko-Pastur upper edge

(1+γ)2 (1+\sqrt{\gamma})^2

is often used as a rough benchmark. Eigenvalues much larger than this edge may reflect structure beyond random noise, depending on the model and assumptions.

This idea is widely used in high-dimensional statistics and principal component analysis.

133.21 Summary

Random matrices combine linear algebra with probability.

The central objects are:

ConceptMeaning
Random matrixMatrix whose entries are random variables
EnsembleProbability model for matrices
Empirical spectral distributionDistribution of eigenvalues of one matrix
Wigner matrixSymmetric matrix with independent random entries
Semicircle lawLimiting eigenvalue distribution for Wigner matrices
Sample covariance matrixMatrix of the form XXT/NXX^T/N
Marchenko-Pastur lawLimiting spectrum of sample covariance matrices
Singular valuesSquare roots of eigenvalues of ATAA^TA
ConcentrationHigh-dimensional random regularity
UniversalityLimit behavior independent of many distribution details
SketchingRandomized dimensional reduction
Spiked modelLow-rank signal plus random noise

Random matrix theory explains why large random linear systems often show deterministic structure. It gives tools for understanding spectra, noise, conditioning, high-dimensional data, and randomized algorithms.