# Signal Processing

## Signal Processing

Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an image, an audio waveform, a radar return, a sensor stream, or a multidimensional field.

Automatic differentiation is useful because many signal processing systems can be written as differentiable programs:

$$
x \to T_\theta(x) \to y \to L(y)
$$

where $x$ is an input signal, $T_\theta$ is a transform or filter with parameters $\theta$, and $L$ is an objective. AD gives derivatives with respect to filter coefficients, model parameters, input samples, reconstruction variables, or calibration constants.

### Linear Filtering

A discrete linear time-invariant filter has the form

$$
y[n] =
\sum_{k} h[k]x[n-k],
$$

where $h$ is the impulse response.

This is convolution:

$$
y = h * x.
$$

If a loss depends on the filtered output,

$$
L = \ell(y),
$$

then AD computes derivatives with respect to both the signal and the filter:

$$
\frac{\partial L}{\partial x},
\qquad
\frac{\partial L}{\partial h}.
$$

The reverse pass through convolution is another convolution-like operation. This is one reason convolutional neural networks fit naturally into AD systems: backpropagation through convolution is a standard signal processing operation.

### Frequency-Domain Methods

Many signal processing algorithms use the Fourier transform.

$$
X[k] =
\sum_{n=0}^{N-1}
x[n]e^{-2\pi i kn/N}.
$$

The inverse transform reconstructs the signal:

$$
x[n] =
\frac{1}{N}
\sum_{k=0}^{N-1}
X[k]e^{2\pi i kn/N}.
$$

Because the discrete Fourier transform is linear, its derivative is the transform itself. Reverse-mode differentiation through an FFT uses the corresponding adjoint transform.

This makes frequency-domain objectives easy to differentiate, for example:

$$
L(x) =
\|\, |\operatorname{FFT}(x)| - a \,\|^2.
$$

Such objectives appear in phase retrieval, spectral matching, audio synthesis, diffraction imaging, and inverse scattering.

### Complex-Valued Differentiation

Signal processing often uses complex numbers. AD systems must define how derivatives behave for complex-valued programs.

A complex signal can be treated as a pair of real signals:

$$
z = x + iy.
$$

For real-valued losses,

$$
L : \mathbb{C}^n \to \mathbb{R},
$$

gradients are usually interpreted with respect to the real and imaginary parts. This avoids assuming that every operation is holomorphic. Many common operations, such as magnitude,

$$
|z| = \sqrt{x^2+y^2},
$$

are not complex analytic, but they are differentiable as real functions except at singular points.

Practical AD systems for signal processing should specify whether gradients use real-imaginary derivatives or Wirtinger-style notation.

### Adaptive Filters

An adaptive filter updates its coefficients from data. A simple finite impulse response filter computes

$$
\hat y[n] =
\sum_{k=0}^{K-1} w_k x[n-k].
$$

Given target signal $d[n]$, the error is

$$
e[n]=d[n]-\hat y[n].
$$

The least-squares objective is

$$
L(w) =
\frac{1}{2}
\sum_n e[n]^2.
$$

AD gives

$$
\nabla_w L,
$$

which can be used in gradient descent, stochastic gradient descent, or more specialized adaptive algorithms.

Traditional algorithms such as LMS can be viewed as hand-derived gradient methods. AD generalizes the same idea to complex filter structures where manual derivative derivation becomes tedious.

### Inverse Problems in Signal Processing

Many signal processing tasks are inverse problems. We observe

$$
y = A x + \epsilon,
$$

and want to recover $x$. Here $A$ may represent blur, downsampling, masking, compression, room acoustics, sensor geometry, or a Fourier sampling operator.

A common reconstruction objective is

$$
L(x) =
\frac{1}{2}\|Ax-y\|^2 + \lambda R(x),
$$

where $R(x)$ is a prior or regularizer.

AD supports reconstruction by providing gradients with respect to $x$. This allows the use of generic optimization methods even when $A$, $R$, or both are implemented as programs rather than matrices.

Examples include:

| Task | Forward operator |
|---|---|
| Deblurring | Convolution with point-spread function |
| Super-resolution | Blur then downsample |
| Compressed sensing | Random projection or masked Fourier samples |
| MRI reconstruction | Fourier sampling on k-space |
| Audio dereverberation | Room impulse response convolution |
| Tomographic reconstruction | Projection operator |

### Differentiable Transforms

Many transforms are differentiable or piecewise differentiable:

| Transform | AD treatment |
|---|---|
| FFT | Linear adjoint transform |
| DCT | Linear adjoint transform |
| Wavelet transform | Filter-bank derivatives |
| Short-time Fourier transform | Windowed linear transform |
| Mel filterbank | Matrix multiplication and nonlinear scaling |
| Cepstrum | FFT, log magnitude, inverse FFT |
| Convolution | Cross-correlation in reverse pass |

This makes AD useful for end-to-end systems that combine classical signal processing with learned models.

For example, an audio model may compute:

```text
waveform
    -> STFT
    -> magnitude
    -> mel filterbank
    -> log compression
    -> neural model
    -> loss
```

The whole pipeline can be differentiated, subject to care around zeros, logarithms, and magnitude operations.

### Sparse and Structured Signals

Many signals have sparse structure. Compressed sensing uses the assumption that $x$ is sparse in some basis:

$$
x = \Psi \alpha,
$$

where most entries of $\alpha$ are zero or near zero.

A typical objective is

$$
L(\alpha) =
\frac{1}{2}\|A\Psi\alpha-y\|^2
+
\lambda \|\alpha\|_1.
$$

The $\ell_1$ norm is non-smooth at zero. AD can still compute subgradient-like values depending on implementation, but optimization requires care.

Smooth approximations are often used:

$$
|\alpha|
\approx
\sqrt{\alpha^2+\epsilon}.
$$

This gives stable gradients while preserving sparsity pressure.

### Differentiable Sampling

Sampling and resampling are central in signal processing. For a continuous signal approximation $x(t)$, resampling evaluates

$$
y[n]=x(\tau_n),
$$

where $\tau_n$ may depend on parameters.

If interpolation is differentiable, then AD can compute derivatives with respect to both samples and sampling locations.

This is useful in:

- image registration,
- time warping,
- differentiable rendering,
- beamforming,
- sensor calibration,
- spatial transformer networks.

Nearest-neighbor sampling is discontinuous with respect to coordinates. Linear, cubic, or spline interpolation gives more useful derivatives.

### Beamforming and Array Processing

Array processing combines measurements from multiple sensors. A beamformer output may be

$$
y(t) =
\sum_m w_m x_m(t-\tau_m(\theta)),
$$

where $w_m$ are weights and $\tau_m(\theta)$ are direction-dependent delays.

AD can compute derivatives with respect to:

| Variable | Meaning |
|---|---|
| $w_m$ | Beamforming weights |
| $\theta$ | Source direction |
| Sensor positions | Array calibration |
| Signal samples | Input sensitivity |

Differentiable beamforming appears in radar, sonar, microphone arrays, radio astronomy, and wireless communication.

### State-Space Models and Kalman Filters

Signal processing often represents systems with state-space models:

$$
x_{t+1}=A x_t + B u_t + \eta_t,
$$

$$
y_t=C x_t + \epsilon_t.
$$

Kalman filtering estimates hidden states from noisy measurements. Its recursion is differentiable, provided matrix inversions and covariance updates are handled carefully.

AD can differentiate a Kalman filter with respect to:

- transition matrices,
- observation matrices,
- noise covariance parameters,
- initial state,
- control parameters.

For long sequences, reverse-mode AD faces the same memory issue as recurrent neural networks. Checkpointing or custom smoother adjoints may be needed.

### Phase Retrieval

Phase retrieval reconstructs a signal from magnitude-only measurements:

$$
y = |Ax|^2.
$$

The phase is missing, so the inverse problem is nonconvex.

A loss may be

$$
L(x) =
\frac{1}{2}
\| |Ax|^2 - y \|^2.
$$

AD computes gradients through the linear transform, magnitude, and squared magnitude. This allows gradient-based reconstruction methods, though initialization and nonconvexity remain major issues.

### Learned Signal Processing

Modern systems often combine hand-designed transforms with learned components. Examples include:

- learned denoisers,
- neural vocoders,
- differentiable codecs,
- learned image reconstruction,
- neural beamformers,
- learned compression,
- differentiable equalizers.

A useful design pattern is to keep classical signal-processing operators explicit and differentiable, then insert learned modules where the model lacks structure.

This gives a hybrid pipeline:

```text
structured physical transform
    -> learned correction
    -> differentiable objective
    -> gradient-based training or inference
```

### Numerical Issues

Signal processing pipelines contain operations that can produce unstable gradients.

| Operation | Issue |
|---|---|
| $\log x$ | Singular near zero |
| $|z|$ | Non-smooth at zero |
| Phase angle | Discontinuous modulo $2\pi$ |
| Hard thresholding | Zero or undefined gradients |
| Quantization | Discontinuous |
| Clipping | Saturated gradients |
| Sorting peaks | Non-smooth selection |

Practical differentiable systems often replace hard operations with smooth approximations during training or optimization.

Examples include soft thresholding, smooth clipping, differentiable peak picking, and noise-aware objectives.

### Custom Derivative Rules

Generic AD can differentiate most signal processing code, but custom rules are often better.

| Component | Better derivative rule |
|---|---|
| FFT | Use known adjoint transform |
| Convolution | Use optimized correlation kernels |
| Linear solve | Use transpose solve |
| Interpolation | Define boundary derivative explicitly |
| Quantization | Use surrogate gradient if needed |
| STFT/ISTFT | Preserve window and overlap-add structure |

Custom rules improve performance and make derivative semantics explicit.

### Summary

Signal processing is a natural domain for automatic differentiation because many operators are linear, structured, and compositional. AD provides gradients through filters, transforms, reconstruction objectives, adaptive systems, beamformers, and state-space estimators.

The main difficulties come from complex-valued computations, non-smooth operations, sampling decisions, quantization, phase ambiguity, and long recurrent filters. Effective differentiable signal processing combines AD with known adjoint operators, stable numerical design, and carefully chosen smooth approximations.

