Skip to content

Signal Processing

Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an...

Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an image, an audio waveform, a radar return, a sensor stream, or a multidimensional field.

Automatic differentiation is useful because many signal processing systems can be written as differentiable programs:

xTθ(x)yL(y) x \to T_\theta(x) \to y \to L(y)

where xx is an input signal, TθT_\theta is a transform or filter with parameters θ\theta, and LL is an objective. AD gives derivatives with respect to filter coefficients, model parameters, input samples, reconstruction variables, or calibration constants.

Linear Filtering

A discrete linear time-invariant filter has the form

y[n]=kh[k]x[nk], y[n] = \sum_{k} h[k]x[n-k],

where hh is the impulse response.

This is convolution:

y=hx. y = h * x.

If a loss depends on the filtered output,

L=(y), L = \ell(y),

then AD computes derivatives with respect to both the signal and the filter:

Lx,Lh. \frac{\partial L}{\partial x}, \qquad \frac{\partial L}{\partial h}.

The reverse pass through convolution is another convolution-like operation. This is one reason convolutional neural networks fit naturally into AD systems: backpropagation through convolution is a standard signal processing operation.

Frequency-Domain Methods

Many signal processing algorithms use the Fourier transform.

X[k]=n=0N1x[n]e2πikn/N. X[k] = \sum_{n=0}^{N-1} x[n]e^{-2\pi i kn/N}.

The inverse transform reconstructs the signal:

x[n]=1Nk=0N1X[k]e2πikn/N. x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]e^{2\pi i kn/N}.

Because the discrete Fourier transform is linear, its derivative is the transform itself. Reverse-mode differentiation through an FFT uses the corresponding adjoint transform.

This makes frequency-domain objectives easy to differentiate, for example:

L(x)=FFT(x)a2. L(x) = \|\, |\operatorname{FFT}(x)| - a \,\|^2.

Such objectives appear in phase retrieval, spectral matching, audio synthesis, diffraction imaging, and inverse scattering.

Complex-Valued Differentiation

Signal processing often uses complex numbers. AD systems must define how derivatives behave for complex-valued programs.

A complex signal can be treated as a pair of real signals:

z=x+iy. z = x + iy.

For real-valued losses,

L:CnR, L : \mathbb{C}^n \to \mathbb{R},

gradients are usually interpreted with respect to the real and imaginary parts. This avoids assuming that every operation is holomorphic. Many common operations, such as magnitude,

z=x2+y2, |z| = \sqrt{x^2+y^2},

are not complex analytic, but they are differentiable as real functions except at singular points.

Practical AD systems for signal processing should specify whether gradients use real-imaginary derivatives or Wirtinger-style notation.

Adaptive Filters

An adaptive filter updates its coefficients from data. A simple finite impulse response filter computes

y^[n]=k=0K1wkx[nk]. \hat y[n] = \sum_{k=0}^{K-1} w_k x[n-k].

Given target signal d[n]d[n], the error is

e[n]=d[n]y^[n]. e[n]=d[n]-\hat y[n].

The least-squares objective is

L(w)=12ne[n]2. L(w) = \frac{1}{2} \sum_n e[n]^2.

AD gives

wL, \nabla_w L,

which can be used in gradient descent, stochastic gradient descent, or more specialized adaptive algorithms.

Traditional algorithms such as LMS can be viewed as hand-derived gradient methods. AD generalizes the same idea to complex filter structures where manual derivative derivation becomes tedious.

Inverse Problems in Signal Processing

Many signal processing tasks are inverse problems. We observe

y=Ax+ϵ, y = A x + \epsilon,

and want to recover xx. Here AA may represent blur, downsampling, masking, compression, room acoustics, sensor geometry, or a Fourier sampling operator.

A common reconstruction objective is

L(x)=12Axy2+λR(x), L(x) = \frac{1}{2}\|Ax-y\|^2 + \lambda R(x),

where R(x)R(x) is a prior or regularizer.

AD supports reconstruction by providing gradients with respect to xx. This allows the use of generic optimization methods even when AA, RR, or both are implemented as programs rather than matrices.

Examples include:

TaskForward operator
DeblurringConvolution with point-spread function
Super-resolutionBlur then downsample
Compressed sensingRandom projection or masked Fourier samples
MRI reconstructionFourier sampling on k-space
Audio dereverberationRoom impulse response convolution
Tomographic reconstructionProjection operator

Differentiable Transforms

Many transforms are differentiable or piecewise differentiable:

TransformAD treatment
FFTLinear adjoint transform
DCTLinear adjoint transform
Wavelet transformFilter-bank derivatives
Short-time Fourier transformWindowed linear transform
Mel filterbankMatrix multiplication and nonlinear scaling
CepstrumFFT, log magnitude, inverse FFT
ConvolutionCross-correlation in reverse pass

This makes AD useful for end-to-end systems that combine classical signal processing with learned models.

For example, an audio model may compute:

waveform
    -> STFT
    -> magnitude
    -> mel filterbank
    -> log compression
    -> neural model
    -> loss

The whole pipeline can be differentiated, subject to care around zeros, logarithms, and magnitude operations.

Sparse and Structured Signals

Many signals have sparse structure. Compressed sensing uses the assumption that xx is sparse in some basis:

x=Ψα, x = \Psi \alpha,

where most entries of α\alpha are zero or near zero.

A typical objective is

L(α)=12AΨαy2+λα1. L(\alpha) = \frac{1}{2}\|A\Psi\alpha-y\|^2 + \lambda \|\alpha\|_1.

The 1\ell_1 norm is non-smooth at zero. AD can still compute subgradient-like values depending on implementation, but optimization requires care.

Smooth approximations are often used:

αα2+ϵ. |\alpha| \approx \sqrt{\alpha^2+\epsilon}.

This gives stable gradients while preserving sparsity pressure.

Differentiable Sampling

Sampling and resampling are central in signal processing. For a continuous signal approximation x(t)x(t), resampling evaluates

y[n]=x(τn), y[n]=x(\tau_n),

where τn\tau_n may depend on parameters.

If interpolation is differentiable, then AD can compute derivatives with respect to both samples and sampling locations.

This is useful in:

  • image registration,
  • time warping,
  • differentiable rendering,
  • beamforming,
  • sensor calibration,
  • spatial transformer networks.

Nearest-neighbor sampling is discontinuous with respect to coordinates. Linear, cubic, or spline interpolation gives more useful derivatives.

Beamforming and Array Processing

Array processing combines measurements from multiple sensors. A beamformer output may be

y(t)=mwmxm(tτm(θ)), y(t) = \sum_m w_m x_m(t-\tau_m(\theta)),

where wmw_m are weights and τm(θ)\tau_m(\theta) are direction-dependent delays.

AD can compute derivatives with respect to:

VariableMeaning
wmw_mBeamforming weights
θ\thetaSource direction
Sensor positionsArray calibration
Signal samplesInput sensitivity

Differentiable beamforming appears in radar, sonar, microphone arrays, radio astronomy, and wireless communication.

State-Space Models and Kalman Filters

Signal processing often represents systems with state-space models:

xt+1=Axt+But+ηt, x_{t+1}=A x_t + B u_t + \eta_t, yt=Cxt+ϵt. y_t=C x_t + \epsilon_t.

Kalman filtering estimates hidden states from noisy measurements. Its recursion is differentiable, provided matrix inversions and covariance updates are handled carefully.

AD can differentiate a Kalman filter with respect to:

  • transition matrices,
  • observation matrices,
  • noise covariance parameters,
  • initial state,
  • control parameters.

For long sequences, reverse-mode AD faces the same memory issue as recurrent neural networks. Checkpointing or custom smoother adjoints may be needed.

Phase Retrieval

Phase retrieval reconstructs a signal from magnitude-only measurements:

y=Ax2. y = |Ax|^2.

The phase is missing, so the inverse problem is nonconvex.

A loss may be

L(x)=12Ax2y2. L(x) = \frac{1}{2} \| |Ax|^2 - y \|^2.

AD computes gradients through the linear transform, magnitude, and squared magnitude. This allows gradient-based reconstruction methods, though initialization and nonconvexity remain major issues.

Learned Signal Processing

Modern systems often combine hand-designed transforms with learned components. Examples include:

  • learned denoisers,
  • neural vocoders,
  • differentiable codecs,
  • learned image reconstruction,
  • neural beamformers,
  • learned compression,
  • differentiable equalizers.

A useful design pattern is to keep classical signal-processing operators explicit and differentiable, then insert learned modules where the model lacks structure.

This gives a hybrid pipeline:

structured physical transform
    -> learned correction
    -> differentiable objective
    -> gradient-based training or inference

Numerical Issues

Signal processing pipelines contain operations that can produce unstable gradients.

OperationIssue
logx\log xSingular near zero
$z
Phase angleDiscontinuous modulo 2π2\pi
Hard thresholdingZero or undefined gradients
QuantizationDiscontinuous
ClippingSaturated gradients
Sorting peaksNon-smooth selection

Practical differentiable systems often replace hard operations with smooth approximations during training or optimization.

Examples include soft thresholding, smooth clipping, differentiable peak picking, and noise-aware objectives.

Custom Derivative Rules

Generic AD can differentiate most signal processing code, but custom rules are often better.

ComponentBetter derivative rule
FFTUse known adjoint transform
ConvolutionUse optimized correlation kernels
Linear solveUse transpose solve
InterpolationDefine boundary derivative explicitly
QuantizationUse surrogate gradient if needed
STFT/ISTFTPreserve window and overlap-add structure

Custom rules improve performance and make derivative semantics explicit.

Summary

Signal processing is a natural domain for automatic differentiation because many operators are linear, structured, and compositional. AD provides gradients through filters, transforms, reconstruction objectives, adaptive systems, beamformers, and state-space estimators.

The main difficulties come from complex-valued computations, non-smooth operations, sampling decisions, quantization, phase ambiguity, and long recurrent filters. Effective differentiable signal processing combines AD with known adjoint operators, stable numerical design, and carefully chosen smooth approximations.