# Chapter 137. Compressed Sensing

# Chapter 137. Compressed Sensing

## 137.1 Introduction

Compressed sensing is a method for recovering sparse or compressible signals from fewer linear measurements than ordinary sampling theory might suggest.

The central idea is simple. If a signal has only a small amount of essential information, then it may be possible to recover it from a small number of carefully chosen measurements. The recovery problem is underdetermined, but sparsity makes the problem solvable.

Compressed sensing studies systems of the form

$$
y = Ax,
$$

where:

| Symbol | Meaning |
|---|---|
| \(x\) | Unknown signal |
| \(A\) | Measurement or sensing matrix |
| \(y\) | Observed measurements |

Usually,

$$
A\in\mathbb{R}^{m\times n}
$$

with

$$
m<n.
$$

Thus there are fewer equations than unknowns. Without additional structure, the system has many solutions. Compressed sensing uses sparsity to select the correct one. This is why the subject belongs naturally to linear algebra, optimization, probability, and signal processing. Compressed sensing is commonly described as recovery from underdetermined linear systems by exploiting sparsity and incoherence.

## 137.2 Sparse Signals

A vector

$$
x\in\mathbb{R}^n
$$

is called \(k\)-sparse if it has at most \(k\) nonzero entries.

The number of nonzero entries is written

$$
\|x\|_0.
$$

Thus \(x\) is \(k\)-sparse if

$$
\|x\|_0\le k.
$$

The notation \(\|\cdot\|_0\) is common but slightly misleading. It is not a norm. It counts nonzero coordinates.

For example,

$$
x =
\begin{bmatrix}
0\\
3\\
0\\
0\\
-2\\
0
\end{bmatrix}
$$

is \(2\)-sparse.

Sparse vectors appear when most coordinates carry no information. In applications, many signals are not sparse in their original coordinates but become sparse after a change of basis.

## 137.3 Compressible Signals

A signal is compressible if it is well approximated by a sparse signal.

For example, a vector may have many nonzero entries, but most of them may be small. Keeping only the largest \(k\) entries gives a sparse approximation.

Let \(x_k\) denote the best \(k\)-sparse approximation to \(x\). Then the approximation error is

$$
\|x-x_k\|.
$$

Compressed sensing is useful when this error is small.

Compressibility is important because real data is rarely exactly sparse. Images, audio, physical signals, and scientific data usually have many small coefficients rather than exactly zero coefficients.

## 137.4 Sparse Bases

A signal may be sparse in one basis and dense in another.

Let

$$
\Psi
$$

be a basis matrix. A signal \(x\) can be written as

$$
x=\Psi s,
$$

where \(s\) is the coefficient vector.

If \(s\) is sparse, then \(x\) is sparse in the basis \(\Psi\).

Common sparsifying systems include:

| System | Typical signal type |
|---|---|
| Fourier basis | Periodic or oscillatory signals |
| Wavelet basis | Images and piecewise smooth signals |
| Discrete cosine basis | Image compression |
| Gradient basis | Piecewise constant signals |
| Learned dictionaries | Data-adaptive sparse models |

Thus compressed sensing often recovers sparse coefficients rather than raw signal coordinates.

## 137.5 Linear Measurements

Compressed sensing takes measurements of the form

$$
y_i = \langle a_i,x\rangle,
$$

where \(a_i\) is the \(i\)-th sensing vector.

Collecting all measurements gives

$$
y=Ax.
$$

Each row of \(A\) defines one linear measurement.

If \(m<n\), the system is underdetermined. The null space of \(A\) has positive dimension, so many vectors satisfy the same measurement equation.

The goal is not to recover every vector. The goal is to recover sparse vectors.

## 137.6 The Recovery Problem

The ideal sparse recovery problem is:

$$
\min_z \|z\|_0
$$

subject to

$$
Az=y.
$$

This asks for the sparsest vector consistent with the measurements.

However, \(\ell_0\)-minimization is computationally difficult in general. The problem is combinatorial because it searches over possible support sets.

A common relaxation replaces \(\|z\|_0\) with the \(\ell^1\)-norm:

$$
\min_z \|z\|_1
$$

subject to

$$
Az=y.
$$

Here

$$
\|z\|_1 = |z_1|+\cdots+|z_n|.
$$

The \(\ell^1\)-problem is convex and can be solved by standard optimization methods. The use of \(\ell^1\)-relaxation is a central technique in compressed sensing.

## 137.7 Basis Pursuit

The problem

$$
\min_z \|z\|_1
\quad
\text{subject to}
\quad
Az=y
$$

is called basis pursuit.

It is the standard noiseless recovery method.

The geometric reason basis pursuit works is that the \(\ell^1\)-ball has corners aligned with coordinate axes. Sparse vectors lie on these corners. When the affine space

$$
\{z:Az=y\}
$$

intersects the \(\ell^1\)-ball, the first point of contact is often sparse.

This is a linear-algebraic and convex-geometric phenomenon.

## 137.8 Noisy Measurements

In practice, measurements contain noise:

$$
y = Ax + e,
$$

where \(e\) is an error vector.

The recovery problem becomes:

$$
\min_z \|z\|_1
$$

subject to

$$
\|Az-y\|_2\le \varepsilon.
$$

This is called basis pursuit denoising.

The constraint allows the reconstructed signal to fit the measurements up to the expected noise level.

Compressed sensing theory distinguishes two properties:

| Property | Meaning |
|---|---|
| Exact recovery | Perfect recovery in the noiseless sparse case |
| Stable recovery | Small noise gives small reconstruction error |

Stable recovery is essential for real applications. Standard compressed sensing theory treats both noisy measurements and signals that are only approximately sparse.

## 137.9 The Null Space View

The null space of \(A\) determines whether sparse recovery is possible.

Suppose

$$
Ax=y.
$$

If \(h\in\ker A\), then

$$
A(x+h)=Ax.
$$

Thus \(x\) and \(x+h\) produce the same measurements.

For recovery to succeed, no nonzero vector in the null space should allow one sparse vector to be confused with another sparse vector.

The null space property gives a necessary and sufficient condition for exact sparse recovery by \(\ell^1\)-minimization. It is important theoretically, although it may be difficult to verify directly.

## 137.10 Restricted Isometry Property

The restricted isometry property, or RIP, is one of the main sufficient conditions for compressed sensing.

A matrix \(A\) satisfies the restricted isometry property of order \(k\) if there exists a number \(\delta_k\in(0,1)\) such that

$$
(1-\delta_k)\|x\|_2^2
\le
\|Ax\|_2^2
\le
(1+\delta_k)\|x\|_2^2
$$

for every \(k\)-sparse vector \(x\).

This says that \(A\) approximately preserves the lengths of all \(k\)-sparse vectors.

If \(A\) preserves distances between sparse vectors, then sparse vectors remain distinguishable after measurement.

RIP connects compressed sensing with geometry. The sensing matrix acts almost like an isometry on sparse subspaces.

## 137.11 Incoherence

Incoherence measures how poorly aligned the measurement system is with the sparsifying system.

If the signal is sparse in basis \(\Psi\), and measurements are taken in another system \(\Phi\), then compressed sensing works best when \(\Phi\) and \(\Psi\) are incoherent.

Informally, each measurement should see many coordinates of the sparse representation rather than only one.

Random measurements are often incoherent with fixed sparse bases.

This explains why random sensing matrices appear frequently in compressed sensing theory.

## 137.12 Random Sensing Matrices

Random matrices are common sensing matrices.

Examples include:

| Matrix type | Entries |
|---|---|
| Gaussian | Independent normal entries |
| Bernoulli | Independent \(\pm1\) entries |
| Partial Fourier | Randomly selected Fourier rows |
| Subsampled orthogonal | Random rows of orthogonal transforms |

Random matrices often satisfy RIP with high probability when

$$
m
$$

is proportional to

$$
k\log(n/k).
$$

This means that the number of measurements depends mainly on sparsity \(k\), not ambient dimension \(n\).

Many random designs give near-optimal sample complexity for sparse recovery.

## 137.13 Measurement Complexity

A guiding estimate in compressed sensing is:

$$
m \gtrsim k\log(n/k).
$$

Here:

| Symbol | Meaning |
|---|---|
| \(m\) | Number of measurements |
| \(k\) | Sparsity level |
| \(n\) | Ambient dimension |

The logarithmic factor appears because the support of a sparse vector is unknown.

There are

$$
\binom{n}{k}
$$

possible support sets.

The measurements must identify both the values on the support and the support itself.

This estimate explains the power of compressed sensing. When \(k\ll n\), far fewer than \(n\) measurements may be enough.

## 137.14 Support Recovery

The support of a vector is the set of indices where it is nonzero:

$$
\operatorname{supp}(x)=\{i:x_i\neq0\}.
$$

Recovering a sparse vector involves two tasks:

1. Identify the support,
2. Estimate the nonzero values.

If the support were known, recovery would reduce to an ordinary least squares problem using the corresponding columns of \(A\).

The difficulty is that the support is unknown.

Many algorithms can be understood as different strategies for finding or approximating the support.

## 137.15 Greedy Algorithms

Greedy algorithms recover sparse signals by selecting columns of \(A\) step by step.

Important examples include:

| Algorithm | Main idea |
|---|---|
| Matching Pursuit | Select one correlated atom at a time |
| Orthogonal Matching Pursuit | Refit after each selected atom |
| CoSaMP | Select multiple candidates and prune |
| Iterative Hard Thresholding | Gradient step followed by sparsification |

Greedy methods are often faster than convex optimization methods.

They are useful when large-scale computation is more important than solving a convex program exactly.

## 137.16 Orthogonal Matching Pursuit

Orthogonal Matching Pursuit, or OMP, builds a support set iteratively.

At each step:

1. Compute the residual

$$
r=y-Ax_k.
$$

2. Find the column of \(A\) most correlated with \(r\).
3. Add that column to the active set.
4. Solve a least squares problem on the active set.
5. Update the residual.

The algorithm stops when the residual is small or when a desired sparsity level is reached.

OMP is simple, interpretable, and often effective.

## 137.17 Lasso

The Lasso solves the penalized problem

$$
\min_z
\frac12\|Az-y\|_2^2+\lambda\|z\|_1.
$$

The first term measures data fit. The second term promotes sparsity.

The parameter

$$
\lambda>0
$$

controls the tradeoff.

Large \(\lambda\) produces sparser solutions. Small \(\lambda\) fits the data more closely.

Lasso is widely used in statistics and machine learning.

It is closely related to basis pursuit denoising.

## 137.18 Geometry of \(\ell^1\)-Minimization

The \(\ell^1\)-norm promotes sparsity because its unit ball has sharp corners.

In two dimensions, the \(\ell^1\)-unit ball is a diamond:

$$
|x_1|+|x_2|\le1.
$$

In higher dimensions, the corners correspond to sparse coordinate patterns.

When minimizing \(\ell^1\) over an affine space, the solution often occurs at a low-dimensional face of the \(\ell^1\)-ball. Such faces correspond to sparse vectors.

This geometric fact explains why a convex norm can recover sparse solutions.

## 137.19 Dictionaries

A dictionary is a collection of atoms used to represent signals.

Let

$$
D\in\mathbb{R}^{n\times p}
$$

be a dictionary matrix. A signal is represented as

$$
x=D\alpha.
$$

The coefficient vector \(\alpha\) may be sparse even when \(D\) has more columns than rows.

If

$$
p>n,
$$

the dictionary is overcomplete.

Compressed sensing with dictionaries solves

$$
y=AD\alpha
$$

and attempts to recover sparse \(\alpha\).

Dictionaries may be fixed, such as wavelets, or learned from data.

## 137.20 Applications

Compressed sensing appears in many areas.

| Application | Role of compressed sensing |
|---|---|
| MRI | Fewer measurements during acquisition |
| Imaging | Reconstruction from incomplete samples |
| Radar | Sparse target recovery |
| Seismology | Sparse reflection recovery |
| Communications | Channel estimation |
| Astronomy | Image reconstruction |
| Machine learning | Sparse feature selection |
| Sensor networks | Reduced measurement and transmission |

MRI is one of the standard application areas, since many medical images are sparse or compressible in transform domains and the measurements are naturally related to Fourier samples.

## 137.21 Summary

Compressed sensing studies recovery of sparse signals from undersampled linear measurements.

The main concepts are:

| Concept | Meaning |
|---|---|
| Sparse signal | Vector with few nonzero entries |
| Compressible signal | Well approximated by a sparse vector |
| Sensing matrix | Matrix defining linear measurements |
| Basis pursuit | \(\ell^1\)-minimization for noiseless recovery |
| Basis pursuit denoising | Recovery with measurement noise |
| Null space property | Exact recovery condition |
| Restricted isometry property | Approximate isometry on sparse vectors |
| Incoherence | Low alignment between sensing and sparsifying systems |
| Random sensing | Random measurements for uniform recovery |
| Greedy recovery | Iterative support selection |
| Lasso | Penalized sparse recovery |
| Dictionary | Sparse representation system |

Compressed sensing shows that linear systems with fewer equations than unknowns can still be solvable when the unknown vector has sparse structure. It combines linear algebra, convex geometry, probability, and optimization into a theory of efficient acquisition and reconstruction.
