Skip to content

Flow-Based Models

Flow-based models are generative models that learn an invertible transformation between a simple probability distribution and a complex data distribution. Unlike many other generative models, flow-based systems provide:

Flow-based models are generative models that learn an invertible transformation between a simple probability distribution and a complex data distribution. Unlike many other generative models, flow-based systems provide:

  • exact likelihood computation,
  • exact latent-variable inference,
  • exact sampling,
  • invertible mappings.

A flow model transforms data into latent variables through a sequence of reversible functions. If the transformation is invertible and differentiable, probability densities can be computed exactly using the change-of-variables formula.

Flow-based models occupy an important position between probabilistic modeling and deep neural networks. They combine:

  • neural network expressiveness,
  • tractable likelihoods,
  • efficient generation,
  • latent representations.

Examples include:

  • NICE,
  • RealNVP,
  • Glow,
  • Neural Spline Flows,
  • Continuous Normalizing Flows.

Motivation

Suppose we wish to model a complex distribution over images:

xpdata(x). x \sim p_{\text{data}}(x).

Directly modeling this distribution is difficult because high-dimensional image distributions are extremely complicated.

Flow-based models solve this problem by learning an invertible mapping:

fθ:xz, f_\theta : x \leftrightarrow z,

where:

VariableMeaning
xxData variable
zzLatent variable

The latent variable is chosen to follow a simple distribution such as:

zN(0,I). z \sim \mathcal{N}(0,I).

If the transformation is invertible, then we can map:

  • data to latent space,
  • latent variables back to data space.

This allows both density estimation and generation.

Change of Variables Formula

The mathematical foundation of flow models is the change-of-variables theorem.

Suppose:

z=fθ(x), z = f_\theta(x),

where fθf_\theta is invertible and differentiable.

Then the probability density of xx satisfies:

pX(x)=pZ(fθ(x))detfθ(x)x. p_X(x) = p_Z(f_\theta(x)) \left| \det \frac{\partial f_\theta(x)}{\partial x} \right|.

genui{“math_block_widget_always_prefetch_v2”:{“content”:“p_X(x)=p_Z(f_\theta(x))\left|\det\frac{\partial f_\theta(x)}{\partial x}\right|”}}

Taking logarithms:

logpX(x)=logpZ(z)+logdetfθ(x)x. \log p_X(x) = \log p_Z(z) + \log \left| \det \frac{\partial f_\theta(x)}{\partial x} \right|.

genui{“math_block_widget_always_prefetch_v2”:{“content”:"\log p_X(x)=\log p_Z(z)+\log\left|\det\frac{\partial f_\theta(x)}{\partial x}\right|"}}

This equation is central to all flow-based models.

The determinant term measures how the transformation changes local volume in space.

Jacobian Matrices

The matrix

fθ(x)x \frac{\partial f_\theta(x)}{\partial x}

is called the Jacobian matrix.

If:

xRd, x \in \mathbb{R}^d,

then the Jacobian is a d×dd\times d matrix:

Jf(x)=[f1x1f1xdfdx1fdxd]. J_f(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_d} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_d}{\partial x_1} & \cdots & \frac{\partial f_d}{\partial x_d} \end{bmatrix}.

The determinant of the Jacobian measures local expansion or contraction of volume.

Efficient computation of this determinant is one of the central design constraints in flow models.

Normalizing Flows

A normalizing flow constructs a complex distribution by composing many simple invertible transformations.

Suppose:

z0p0(z0) z_0 \sim p_0(z_0)

is a simple base distribution.

Apply a sequence of invertible mappings:

z1=f1(z0), z_1 = f_1(z_0), z2=f2(z1), z_2 = f_2(z_1),

continuing until:

x=zK. x = z_K.

The full transformation becomes:

x=fKfK1f1(z0). x = f_K \circ f_{K-1} \circ \cdots \circ f_1(z_0).

The log-density becomes:

logp(x)=logp(z0)k=1Klogdetfkzk1. \log p(x) = \log p(z_0) - \sum_{k=1}^K \log \left| \det \frac{\partial f_k}{\partial z_{k-1}} \right|.

genui{“math_block_widget_always_prefetch_v2”:{“content”:"\log p(x)=\log p(z_0)-\sum_{k=1}^K\log\left|\det\frac{\partial f_k}{\partial z_{k-1}}\right|"}}

Each layer gradually transforms a simple Gaussian distribution into a complicated data distribution.

Desirable Properties of Flow Layers

A practical flow transformation should satisfy three properties.

PropertyImportance
InvertibilityEnables exact latent inference
Efficient Jacobian determinantEnables tractable likelihood computation
ExpressivenessAllows modeling complex distributions

Balancing these requirements is the main architectural challenge in flow-based modeling.

NICE

NICE, or Nonlinear Independent Components Estimation, introduced additive coupling layers.

Split the input:

x=(x1,x2). x = (x_1, x_2).

The transformation becomes:

y1=x1, y_1 = x_1, y2=x2+m(x1), y_2 = x_2 + m(x_1),

where m()m(\cdot) is a neural network.

This transformation is invertible:

x1=y1, x_1 = y_1, x2=y2m(y1). x_2 = y_2 - m(y_1).

The Jacobian determinant equals 1, making likelihood computation trivial.

However, additive coupling limits expressiveness.

RealNVP

RealNVP improved NICE using affine coupling layers.

The transformation becomes:

y1=x1, y_1 = x_1, y2=x2exp(s(x1))+t(x1), y_2 = x_2 \odot \exp(s(x_1)) + t(x_1),

where:

FunctionRole
s(x1)s(x_1)Scale transformation
t(x1)t(x_1)Translation transformation

genui{“math_block_widget_always_prefetch_v2”:{“content”:“y_2=x_2\odot\exp(s(x_1))+t(x_1)”}}

The Jacobian determinant becomes easy to compute:

logdetJ=isi(x1). \log |\det J| = \sum_i s_i(x_1).

Affine coupling layers became one of the most influential flow architectures.

Glow

Glow introduced several important improvements:

InnovationPurpose
Invertible 1×11\times1 convolutionsChannel mixing
ActNormStable normalization
Multi-scale architectureHierarchical representation

Glow achieved high-quality image generation while maintaining exact likelihood computation.

The invertible convolution generalizes permutation operations by learning channel mixing directly.

Invertible Neural Networks

Flow models are examples of invertible neural networks.

Unlike standard feedforward networks:

  • every layer must be reversible,
  • information cannot be discarded.

This constraint distinguishes flow models from autoencoders or diffusion systems.

An invertible network satisfies:

x=f1(f(x)). x = f^{-1}(f(x)).

Invertibility guarantees exact latent recovery.

Continuous Normalizing Flows

Discrete flow layers can be generalized into continuous dynamics.

A continuous normalizing flow defines:

dz(t)dt=fθ(z(t),t). \frac{dz(t)}{dt} = f_\theta(z(t), t).

genui{“math_block_widget_always_prefetch_v2”:{“content”:"\frac{dz(t)}{dt}=f_\theta(z(t),t)"}}

The latent state evolves continuously through time.

Probability densities evolve according to:

ddtlogp(z(t))=Tr(fθz). \frac{d}{dt}\log p(z(t)) = - \operatorname{Tr} \left( \frac{\partial f_\theta}{\partial z} \right).

Continuous flows connect deep learning with differential equations and dynamical systems.

Neural ODE Interpretation

Continuous flows are closely related to neural ordinary differential equations.

Instead of discrete layers:

zk+1=zk+fθ(zk), z_{k+1} = z_k + f_\theta(z_k),

the model evolves continuously:

dzdt=fθ(z,t). \frac{dz}{dt} = f_\theta(z,t).

This perspective links generative modeling with physics-inspired continuous dynamics.

Flow-Based Sampling

Sampling from a flow model is straightforward.

Step 1: Sample Latent Variable

Draw:

zN(0,I). z \sim \mathcal{N}(0,I).

Step 2: Apply Inverse Transform

Compute:

x=fθ1(z). x = f_\theta^{-1}(z).

Because the mapping is exact and invertible, sampling is efficient.

Unlike diffusion models, no iterative denoising process is required.

Exact Likelihood Estimation

A major advantage of flow models is exact likelihood evaluation.

Unlike:

  • GANs,
  • diffusion models,
  • energy-based models,

flow models compute normalized likelihoods directly.

This allows:

  • principled probabilistic training,
  • density estimation,
  • anomaly detection,
  • calibrated uncertainty estimation.

Latent Space Structure

The latent space often captures semantic structure.

For example:

  • nearby latent vectors generate similar images,
  • interpolation between latent vectors produces smooth transitions,
  • arithmetic operations may encode semantic relationships.

Example:

zsmilingzneutral+zfemale z_{\text{smiling}} - z_{\text{neutral}} + z_{\text{female}}

may generate semantically meaningful transformations.

This behavior resembles latent spaces in VAEs and diffusion models.

Flow Models Versus VAEs

Flow ModelsVariational Autoencoders
Exact likelihoodApproximate likelihood
Invertible mappingStochastic encoder-decoder
Exact latent inferenceApproximate posterior
No information bottleneckBottleneck through latent sampling
Higher memory costOften easier scaling

Flow models prioritize exact probabilistic modeling. VAEs prioritize flexible approximate inference.

Flow Models Versus GANs

Flow ModelsGANs
Exact likelihoodNo explicit likelihood
Stable optimizationAdversarial instability
Bidirectional mappingGenerator-only mapping
Invertible architectureFlexible architectures
Often slower generation quality progressHistorically sharper images

GANs historically produced sharper images, but flow models provided principled probabilistic learning.

Flow Models Versus Diffusion Models

Flow ModelsDiffusion Models
Exact likelihoodApproximate likelihood
One-pass samplingIterative denoising
Invertible transformsStochastic reverse process
Fast generationSlower sampling
Architectural constraintsFlexible denoising networks

Diffusion models currently dominate high-quality image generation, but flows remain attractive for efficient density estimation and invertible representation learning.

Applications of Flow-Based Models

Flow-based models are used in:

ApplicationPurpose
Image generationExact generative modeling
Density estimationProbabilistic modeling
Anomaly detectionLikelihood-based detection
Audio synthesisWaveform generation
Scientific simulationPhysical system modeling
Bayesian inferenceFlexible posterior distributions

Flows are particularly useful when exact densities matter.

PyTorch Example

A simple affine coupling layer:

import torch
from torch import nn

class AffineCoupling(nn.Module):
    def __init__(self, dim):
        super().__init__()

        self.scale_net = nn.Sequential(
            nn.Linear(dim // 2, 128),
            nn.ReLU(),
            nn.Linear(128, dim // 2)
        )

        self.translate_net = nn.Sequential(
            nn.Linear(dim // 2, 128),
            nn.ReLU(),
            nn.Linear(128, dim // 2)
        )

    def forward(self, x):
        x1, x2 = x.chunk(2, dim=1)

        s = self.scale_net(x1)
        t = self.translate_net(x1)

        y1 = x1
        y2 = x2 * torch.exp(s) + t

        log_det = s.sum(dim=1)

        return torch.cat([y1, y2], dim=1), log_det

The layer computes both:

  • transformed outputs,
  • Jacobian log-determinants.

Training uses maximum likelihood:

z, log_det = flow(x)

log_prob_z = standard_normal.log_prob(z).sum(dim=1)

loss = -(log_prob_z + log_det).mean()

Limitations of Flow-Based Models

Flow models have several important limitations.

Invertibility Constraints

Architectures must remain reversible, limiting flexibility.

Jacobian Computation

Efficient determinants constrain layer design.

Memory Usage

Invertible operations may require large activations.

Image Quality

Historically, diffusion models surpassed flows in perceptual quality.

Scalability

Very large flow models can become computationally expensive.

Relationship to Modern AI

Flow-based ideas continue influencing modern systems.

Diffusion Models

Some diffusion systems incorporate invertible transformations.

Scientific Machine Learning

Flows model physical and probabilistic systems with exact densities.

Bayesian Deep Learning

Flows approximate complex posterior distributions.

Representation Learning

Invertible architectures support structured latent spaces.

Continuous-Time Modeling

Continuous flows connect deep learning with dynamical systems.

Summary

Flow-based models learn invertible transformations between simple latent distributions and complex data distributions. Using the change-of-variables formula, they compute exact likelihoods and support exact latent inference.

Normalizing flows construct expressive generative models by composing many invertible transformations. Architectures such as NICE, RealNVP, Glow, and continuous normalizing flows introduced efficient Jacobian computation and scalable probabilistic learning.

Flow models provide mathematically elegant generative modeling with exact probabilistic interpretation. Although diffusion models currently dominate many generative tasks, flow-based systems remain important for density estimation, invertible representation learning, Bayesian inference, and continuous probabilistic modeling.