Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an...
Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an image, an audio waveform, a radar return, a sensor stream, or a multidimensional field.
Automatic differentiation is useful because many signal processing systems can be written as differentiable programs:
where is an input signal, is a transform or filter with parameters , and is an objective. AD gives derivatives with respect to filter coefficients, model parameters, input samples, reconstruction variables, or calibration constants.
Linear Filtering
A discrete linear time-invariant filter has the form
where is the impulse response.
This is convolution:
If a loss depends on the filtered output,
then AD computes derivatives with respect to both the signal and the filter:
The reverse pass through convolution is another convolution-like operation. This is one reason convolutional neural networks fit naturally into AD systems: backpropagation through convolution is a standard signal processing operation.
Frequency-Domain Methods
Many signal processing algorithms use the Fourier transform.
The inverse transform reconstructs the signal:
Because the discrete Fourier transform is linear, its derivative is the transform itself. Reverse-mode differentiation through an FFT uses the corresponding adjoint transform.
This makes frequency-domain objectives easy to differentiate, for example:
Such objectives appear in phase retrieval, spectral matching, audio synthesis, diffraction imaging, and inverse scattering.
Complex-Valued Differentiation
Signal processing often uses complex numbers. AD systems must define how derivatives behave for complex-valued programs.
A complex signal can be treated as a pair of real signals:
For real-valued losses,
gradients are usually interpreted with respect to the real and imaginary parts. This avoids assuming that every operation is holomorphic. Many common operations, such as magnitude,
are not complex analytic, but they are differentiable as real functions except at singular points.
Practical AD systems for signal processing should specify whether gradients use real-imaginary derivatives or Wirtinger-style notation.
Adaptive Filters
An adaptive filter updates its coefficients from data. A simple finite impulse response filter computes
Given target signal , the error is
The least-squares objective is
AD gives
which can be used in gradient descent, stochastic gradient descent, or more specialized adaptive algorithms.
Traditional algorithms such as LMS can be viewed as hand-derived gradient methods. AD generalizes the same idea to complex filter structures where manual derivative derivation becomes tedious.
Inverse Problems in Signal Processing
Many signal processing tasks are inverse problems. We observe
and want to recover . Here may represent blur, downsampling, masking, compression, room acoustics, sensor geometry, or a Fourier sampling operator.
A common reconstruction objective is
where is a prior or regularizer.
AD supports reconstruction by providing gradients with respect to . This allows the use of generic optimization methods even when , , or both are implemented as programs rather than matrices.
Examples include:
| Task | Forward operator |
|---|---|
| Deblurring | Convolution with point-spread function |
| Super-resolution | Blur then downsample |
| Compressed sensing | Random projection or masked Fourier samples |
| MRI reconstruction | Fourier sampling on k-space |
| Audio dereverberation | Room impulse response convolution |
| Tomographic reconstruction | Projection operator |
Differentiable Transforms
Many transforms are differentiable or piecewise differentiable:
| Transform | AD treatment |
|---|---|
| FFT | Linear adjoint transform |
| DCT | Linear adjoint transform |
| Wavelet transform | Filter-bank derivatives |
| Short-time Fourier transform | Windowed linear transform |
| Mel filterbank | Matrix multiplication and nonlinear scaling |
| Cepstrum | FFT, log magnitude, inverse FFT |
| Convolution | Cross-correlation in reverse pass |
This makes AD useful for end-to-end systems that combine classical signal processing with learned models.
For example, an audio model may compute:
waveform
-> STFT
-> magnitude
-> mel filterbank
-> log compression
-> neural model
-> lossThe whole pipeline can be differentiated, subject to care around zeros, logarithms, and magnitude operations.
Sparse and Structured Signals
Many signals have sparse structure. Compressed sensing uses the assumption that is sparse in some basis:
where most entries of are zero or near zero.
A typical objective is
The norm is non-smooth at zero. AD can still compute subgradient-like values depending on implementation, but optimization requires care.
Smooth approximations are often used:
This gives stable gradients while preserving sparsity pressure.
Differentiable Sampling
Sampling and resampling are central in signal processing. For a continuous signal approximation , resampling evaluates
where may depend on parameters.
If interpolation is differentiable, then AD can compute derivatives with respect to both samples and sampling locations.
This is useful in:
- image registration,
- time warping,
- differentiable rendering,
- beamforming,
- sensor calibration,
- spatial transformer networks.
Nearest-neighbor sampling is discontinuous with respect to coordinates. Linear, cubic, or spline interpolation gives more useful derivatives.
Beamforming and Array Processing
Array processing combines measurements from multiple sensors. A beamformer output may be
where are weights and are direction-dependent delays.
AD can compute derivatives with respect to:
| Variable | Meaning |
|---|---|
| Beamforming weights | |
| Source direction | |
| Sensor positions | Array calibration |
| Signal samples | Input sensitivity |
Differentiable beamforming appears in radar, sonar, microphone arrays, radio astronomy, and wireless communication.
State-Space Models and Kalman Filters
Signal processing often represents systems with state-space models:
Kalman filtering estimates hidden states from noisy measurements. Its recursion is differentiable, provided matrix inversions and covariance updates are handled carefully.
AD can differentiate a Kalman filter with respect to:
- transition matrices,
- observation matrices,
- noise covariance parameters,
- initial state,
- control parameters.
For long sequences, reverse-mode AD faces the same memory issue as recurrent neural networks. Checkpointing or custom smoother adjoints may be needed.
Phase Retrieval
Phase retrieval reconstructs a signal from magnitude-only measurements:
The phase is missing, so the inverse problem is nonconvex.
A loss may be
AD computes gradients through the linear transform, magnitude, and squared magnitude. This allows gradient-based reconstruction methods, though initialization and nonconvexity remain major issues.
Learned Signal Processing
Modern systems often combine hand-designed transforms with learned components. Examples include:
- learned denoisers,
- neural vocoders,
- differentiable codecs,
- learned image reconstruction,
- neural beamformers,
- learned compression,
- differentiable equalizers.
A useful design pattern is to keep classical signal-processing operators explicit and differentiable, then insert learned modules where the model lacks structure.
This gives a hybrid pipeline:
structured physical transform
-> learned correction
-> differentiable objective
-> gradient-based training or inferenceNumerical Issues
Signal processing pipelines contain operations that can produce unstable gradients.
| Operation | Issue |
|---|---|
| Singular near zero | |
| $ | z |
| Phase angle | Discontinuous modulo |
| Hard thresholding | Zero or undefined gradients |
| Quantization | Discontinuous |
| Clipping | Saturated gradients |
| Sorting peaks | Non-smooth selection |
Practical differentiable systems often replace hard operations with smooth approximations during training or optimization.
Examples include soft thresholding, smooth clipping, differentiable peak picking, and noise-aware objectives.
Custom Derivative Rules
Generic AD can differentiate most signal processing code, but custom rules are often better.
| Component | Better derivative rule |
|---|---|
| FFT | Use known adjoint transform |
| Convolution | Use optimized correlation kernels |
| Linear solve | Use transpose solve |
| Interpolation | Define boundary derivative explicitly |
| Quantization | Use surrogate gradient if needed |
| STFT/ISTFT | Preserve window and overlap-add structure |
Custom rules improve performance and make derivative semantics explicit.
Summary
Signal processing is a natural domain for automatic differentiation because many operators are linear, structured, and compositional. AD provides gradients through filters, transforms, reconstruction objectives, adaptive systems, beamformers, and state-space estimators.
The main difficulties come from complex-valued computations, non-smooth operations, sampling decisions, quantization, phase ambiguity, and long recurrent filters. Effective differentiable signal processing combines AD with known adjoint operators, stable numerical design, and carefully chosen smooth approximations.