# Chapter 133. Random Matrices

# Chapter 133. Random Matrices

## 133.1 Introduction

A random matrix is a matrix whose entries are random variables.

Random matrix theory studies how algebraic quantities such as eigenvalues, singular values, determinants, traces, ranks, and condition numbers behave when the matrix is chosen from a probability model. It is a meeting point of linear algebra, probability, statistics, numerical analysis, mathematical physics, and data science.

The subject is especially useful when the dimension is large. In small dimensions, one usually studies a matrix entry by entry. In large dimensions, individual entries matter less than collective behavior. The eigenvalues of a large random matrix often arrange themselves according to deterministic patterns. This is one of the main discoveries of random matrix theory. Random matrix theory studies eigenvalues, eigenvectors, and singular values of large matrices with random entries, and classical limit laws such as Wigner's semicircle law and the Marchenko-Pastur law describe common limiting spectral distributions.

The basic question is:

$$
\text{What linear-algebraic structure appears when matrix entries are random?}
$$

This question has many forms. How large is the largest eigenvalue? How stable is a random linear system? What is the rank of a random matrix? How does noise change singular values? When does a random matrix behave almost like an isometry?

## 133.2 Random Vectors

A random vector in \(\mathbb{R}^n\) is a vector

$$
X =
\begin{bmatrix}
X_1\\
X_2\\
\vdots\\
X_n
\end{bmatrix}
$$

whose components are random variables.

The expectation of \(X\) is defined componentwise:

$$
\mathbb{E}X =
\begin{bmatrix}
\mathbb{E}X_1\\
\mathbb{E}X_2\\
\vdots\\
\mathbb{E}X_n
\end{bmatrix}.
$$

If \(X\) has mean zero, then

$$
\mathbb{E}X=0.
$$

The covariance matrix of \(X\) is

$$
\Sigma =
\mathbb{E}\left[(X-\mu)(X-\mu)^T\right],
$$

where

$$
\mu=\mathbb{E}X.
$$

The covariance matrix is symmetric and positive semidefinite. It records second-order dependence among the coordinates of \(X\).

Random matrices are often built by placing random vectors as rows or columns.

## 133.3 Random Matrices

A random matrix is a matrix-valued random variable.

For example,

$$
A =
\begin{bmatrix}
A_{11} & A_{12} & \cdots & A_{1n}\\
A_{21} & A_{22} & \cdots & A_{2n}\\
\vdots & \vdots & \ddots & \vdots\\
A_{m1} & A_{m2} & \cdots & A_{mn}
\end{bmatrix}
$$

is random if each entry \(A_{ij}\) is a random variable.

A common model assumes that the entries are independent and identically distributed. This is often abbreviated as i.i.d.

For example, one may assume

$$
A_{ij}\sim N(0,1),
$$

independently for all \(i,j\).

Another common normalization is

$$
A_{ij}\sim N\left(0,\frac{1}{n}\right).
$$

The normalization by \(1/n\) keeps eigenvalues and singular values at a stable scale as \(n\) grows.

## 133.4 Ensembles

A probability model for random matrices is called an ensemble.

Important ensembles include:

| Ensemble | Matrix type | Typical use |
|---|---|---|
| Gaussian Orthogonal Ensemble | Real symmetric | Spectral statistics |
| Gaussian Unitary Ensemble | Complex Hermitian | Quantum models |
| Wishart ensemble | Sample covariance | Statistics |
| Bernoulli ensemble | Entries \(\pm1\) | Discrete random matrices |
| Ginibre ensemble | General non-symmetric | Non-Hermitian spectra |
| Sparse random matrices | Many zero entries | Graphs and networks |

The word ensemble emphasizes that one studies a whole distribution of matrices rather than one fixed matrix.

## 133.5 Symmetric Random Matrices

A real symmetric random matrix satisfies

$$
A^T=A.
$$

Because it is symmetric, all eigenvalues are real. This makes spectral analysis more tractable.

A standard Wigner matrix is a symmetric random matrix whose upper-triangular entries are independent, usually with mean zero and suitable variance. The entries below the diagonal are then determined by symmetry:

$$
A_{ij}=A_{ji}.
$$

A common normalization is

$$
A_{ij} =
\frac{X_{ij}}{\sqrt{n}},
$$

where the \(X_{ij}\) have mean zero and variance one.

The factor

$$
\frac{1}{\sqrt{n}}
$$

keeps the eigenvalues in a bounded interval as \(n\to\infty\).

## 133.6 Empirical Spectral Distribution

Let

$$
A_n
$$

be an \(n\times n\) matrix with real eigenvalues

$$
\lambda_1,\ldots,\lambda_n.
$$

The empirical spectral distribution is the probability measure

$$
\mu_{A_n} =
\frac{1}{n}\sum_{i=1}^n \delta_{\lambda_i},
$$

where \(\delta_{\lambda_i}\) is the point mass at \(\lambda_i\).

This measure places equal mass on each eigenvalue.

Instead of studying each eigenvalue separately, one studies the distribution of all eigenvalues at once.

For large random matrices, the empirical spectral distribution often converges to a deterministic limiting distribution.

This is the spectral analogue of the law of large numbers.

## 133.7 Wigner's Semicircle Law

Wigner's semicircle law is one of the central results of random matrix theory.

For a broad class of symmetric random matrices with independent entries, mean zero, and appropriate variance scaling, the empirical spectral distribution converges to the semicircle distribution.

The semicircle density is

$$
\rho(x) =
\frac{1}{2\pi}\sqrt{4-x^2},
\qquad
-2\le x\le2.
$$

Outside this interval,

$$
\rho(x)=0.
$$

Thus, for large \(n\), most eigenvalues lie near the interval

$$
[-2,2].
$$

The word “semicircle” comes from the graph of the density, which is the upper half of a circle after scaling.

This theorem shows that the global eigenvalue distribution can become deterministic even though the entries of the matrix are random. Wigner introduced this type of model in the 1950s while studying spectra of complex physical systems.

## 133.8 The Moment Method

One way to prove spectral limit laws is the moment method.

The \(k\)-th moment of the empirical spectral distribution is

$$
\int x^k\,d\mu_{A_n}(x).
$$

Since the eigenvalues of \(A_n\) are \(\lambda_1,\ldots,\lambda_n\), this moment equals

$$
\frac{1}{n}\sum_{i=1}^n \lambda_i^k.
$$

Using the trace,

$$
\frac{1}{n}\sum_{i=1}^n \lambda_i^k =
\frac{1}{n}\operatorname{tr}(A_n^k).
$$

Thus spectral moments can be computed from traces.

The trace expands into sums over products of entries:

$$
\operatorname{tr}(A_n^k) =
\sum_{i_1,\ldots,i_k}
A_{i_1i_2}A_{i_2i_3}\cdots A_{i_ki_1}.
$$

The expectation of this expression can be analyzed by counting which index patterns contribute.

This method connects random matrix theory with combinatorics.

## 133.9 Sample Covariance Matrices

Suppose

$$
X_1,\ldots,X_N\in\mathbb{R}^p
$$

are data vectors with mean zero.

Arrange them as columns of a matrix

$$
X =
\begin{bmatrix}
| & | & & |\\
X_1 & X_2 & \cdots & X_N\\
| & | & & |
\end{bmatrix}.
$$

The sample covariance matrix is

$$
S =
\frac{1}{N}XX^T.
$$

This matrix is symmetric and positive semidefinite.

Its eigenvalues describe the variance of the data in different directions.

When both \(p\) and \(N\) are large, the eigenvalues of \(S\) are random and strongly affected by dimension. Random matrix theory gives baseline predictions for what covariance spectra look like under pure noise.

## 133.10 Marchenko-Pastur Law

The Marchenko-Pastur law describes the limiting eigenvalue distribution of large sample covariance matrices.

Let \(X\) be a \(p\times N\) random matrix with independent entries of mean zero and suitable variance. Consider

$$
S=\frac{1}{N}XX^T.
$$

If

$$
p,N\to\infty
$$

with

$$
\frac{p}{N}\to\gamma,
$$

then the eigenvalues of \(S\) converge to a deterministic distribution supported on

$$
\left[(1-\sqrt{\gamma})^2,\ (1+\sqrt{\gamma})^2\right]
$$

in the standard variance-one scaling.

This distribution is the Marchenko-Pastur distribution. It describes the asymptotic behavior of eigenvalues of large sample covariance matrices and, equivalently, singular values of large rectangular random matrices.

The Marchenko-Pastur law is important in statistics because it explains why high-dimensional covariance matrices have spread-out eigenvalues even when the underlying population covariance is the identity.

In high dimensions, noise has structure.

## 133.11 Singular Values

For a rectangular matrix

$$
A\in\mathbb{R}^{m\times n},
$$

the singular values are the nonnegative square roots of the eigenvalues of

$$
A^TA.
$$

They are written as

$$
\sigma_1(A)\ge\sigma_2(A)\ge\cdots\ge0.
$$

Random matrix theory studies the distribution of these singular values.

The largest singular value controls the operator norm:

$$
\|A\|_2=\sigma_1(A).
$$

The smallest singular value controls invertibility and conditioning when \(A\) is square or nearly square.

A random matrix with independent entries often has singular values concentrated in a predictable interval after proper normalization.

## 133.12 Conditioning

The condition number of an invertible matrix \(A\) is

$$
\kappa(A) =
\frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}.
$$

A large condition number means that solving

$$
Ax=b
$$

is sensitive to perturbations.

For random matrices, the smallest singular value is especially important. If it is close to zero, the matrix is nearly singular.

Random matrix theory gives probabilistic bounds for

$$
\sigma_{\min}(A)
$$

and

$$
\kappa(A).
$$

These bounds explain why random linear systems are often better behaved than worst-case systems.

## 133.13 Concentration of Measure

Many random matrix results depend on concentration of measure.

A random quantity concentrates when it is very close to its mean or typical value with high probability.

For example, if \(x\in\mathbb{R}^n\) is a random vector with independent standard normal entries, then

$$
\|x\|^2 =
x_1^2+\cdots+x_n^2
$$

is usually close to

$$
n.
$$

Thus

$$
\|x\|
$$

is usually close to

$$
\sqrt{n}.
$$

In high dimensions, randomness can become highly regular.

Random matrices inherit this behavior. Norms, singular values, traces, and quadratic forms often concentrate near deterministic values.

## 133.14 Subgaussian Random Variables

A subgaussian random variable is a random variable whose tails decay at least as fast as a Gaussian tail, up to constants.

Examples include:

| Distribution | Subgaussian? |
|---|---|
| Standard normal | Yes |
| Bernoulli \(\pm1\) | Yes |
| Uniform on \([-1,1]\) | Yes |
| Cauchy | No |

Subgaussian assumptions are common because they allow strong probability bounds while covering many useful distributions.

Random matrices with independent subgaussian entries often behave similarly to Gaussian random matrices at the level of norm and singular value estimates.

This phenomenon is part of universality.

## 133.15 Universality

Universality means that many large-scale spectral behaviors do not depend strongly on the exact distribution of the entries.

For example, Wigner's semicircle law holds for many entry distributions, not just Gaussian ones. The detailed distribution of each entry becomes less important than broad features such as independence, mean, variance, and tail behavior.

This resembles the central limit theorem: many sums of independent random variables converge to the same Gaussian limit.

In random matrix theory, many eigenvalue distributions and local statistics show similar distribution-independent behavior.

## 133.16 Random Projections

Random matrices are used to project high-dimensional data into lower-dimensional spaces.

Let

$$
A\in\mathbb{R}^{m\times n}
$$

be a random matrix with

$$
m<n.
$$

The map

$$
x\mapsto Ax
$$

compresses vectors from \(\mathbb{R}^n\) into \(\mathbb{R}^m\).

With proper normalization, random projections approximately preserve distances among a finite set of points when \(m\) is large enough.

This principle is used in dimensionality reduction, compressed sensing, nearest-neighbor search, and randomized numerical linear algebra.

## 133.17 Randomized Numerical Linear Algebra

Random matrices can accelerate deterministic linear algebra algorithms.

A large matrix \(A\) may be sketched by multiplying it with a random matrix \(\Omega\):

$$
Y=A\Omega.
$$

The matrix \(Y\) captures important information about the column space of \(A\).

This idea supports randomized algorithms for:

| Task | Randomized method |
|---|---|
| Low-rank approximation | Randomized SVD |
| Least squares | Sketching |
| Matrix multiplication | Random sampling |
| Trace estimation | Random probes |
| Preconditioning | Random embeddings |

The goal is to reduce computational cost while preserving the important linear structure.

## 133.18 Spiked Models

A spiked model separates signal from noise.

A typical form is

$$
A = S + N,
$$

where \(S\) is a low-rank signal matrix and \(N\) is random noise.

The question is whether the signal can be detected from the eigenvalues or singular values of \(A\).

In covariance problems, one may study

$$
\Sigma = I + \theta uu^T,
$$

where \(I\) is isotropic noise and \(\theta uu^T\) is a rank-one signal.

Random matrix theory describes when the top eigenvalue separates from the noise bulk and when the corresponding eigenvector aligns with the signal direction.

This is important in principal component analysis.

## 133.19 Random Graph Matrices

Random graphs produce random matrices.

For a graph \(G\), the adjacency matrix is

$$
A_{ij} =
\begin{cases}
1, & \text{if vertices } i \text{ and } j \text{ are connected},\\
0, & \text{otherwise}.
\end{cases}
$$

If the graph is random, then the adjacency matrix is random.

Spectral properties of this matrix reveal graph structure.

Examples include:

| Matrix | Encodes |
|---|---|
| Adjacency matrix | Edges |
| Laplacian matrix | Connectivity |
| Normalized Laplacian | Random walks |
| Non-backtracking matrix | Community structure |

Random graph matrices connect random matrix theory with combinatorics and network science.

## 133.20 Noise, Signal, and Spectrum

One of the practical uses of random matrix theory is distinguishing signal from noise.

In data analysis, an observed matrix often has the form

$$
A = \text{signal} + \text{noise}.
$$

The spectrum of a pure-noise matrix gives a reference model.

Eigenvalues outside the expected noise range may indicate signal.

For sample covariance matrices, the Marchenko-Pastur upper edge

$$
(1+\sqrt{\gamma})^2
$$

is often used as a rough benchmark. Eigenvalues much larger than this edge may reflect structure beyond random noise, depending on the model and assumptions.

This idea is widely used in high-dimensional statistics and principal component analysis.

## 133.21 Summary

Random matrices combine linear algebra with probability.

The central objects are:

| Concept | Meaning |
|---|---|
| Random matrix | Matrix whose entries are random variables |
| Ensemble | Probability model for matrices |
| Empirical spectral distribution | Distribution of eigenvalues of one matrix |
| Wigner matrix | Symmetric matrix with independent random entries |
| Semicircle law | Limiting eigenvalue distribution for Wigner matrices |
| Sample covariance matrix | Matrix of the form \(XX^T/N\) |
| Marchenko-Pastur law | Limiting spectrum of sample covariance matrices |
| Singular values | Square roots of eigenvalues of \(A^TA\) |
| Concentration | High-dimensional random regularity |
| Universality | Limit behavior independent of many distribution details |
| Sketching | Randomized dimensional reduction |
| Spiked model | Low-rank signal plus random noise |

Random matrix theory explains why large random linear systems often show deterministic structure. It gives tools for understanding spectra, noise, conditioning, high-dimensional data, and randomized algorithms.