# Flow-Based Models

Flow-based models are generative models that learn an invertible transformation between a simple probability distribution and a complex data distribution. Unlike many other generative models, flow-based systems provide:

- exact likelihood computation,
- exact latent-variable inference,
- exact sampling,
- invertible mappings.

A flow model transforms data into latent variables through a sequence of reversible functions. If the transformation is invertible and differentiable, probability densities can be computed exactly using the change-of-variables formula.

Flow-based models occupy an important position between probabilistic modeling and deep neural networks. They combine:

- neural network expressiveness,
- tractable likelihoods,
- efficient generation,
- latent representations.

Examples include:

- NICE,
- RealNVP,
- Glow,
- Neural Spline Flows,
- Continuous Normalizing Flows.

### Motivation

Suppose we wish to model a complex distribution over images:

$$
x \sim p_{\text{data}}(x).
$$

Directly modeling this distribution is difficult because high-dimensional image distributions are extremely complicated.

Flow-based models solve this problem by learning an invertible mapping:

$$
f_\theta : x \leftrightarrow z,
$$

where:

| Variable | Meaning |
|---|---|
| $x$ | Data variable |
| $z$ | Latent variable |

The latent variable is chosen to follow a simple distribution such as:

$$
z \sim \mathcal{N}(0,I).
$$

If the transformation is invertible, then we can map:

- data to latent space,
- latent variables back to data space.

This allows both density estimation and generation.

### Change of Variables Formula

The mathematical foundation of flow models is the change-of-variables theorem.

Suppose:

$$
z = f_\theta(x),
$$

where $f_\theta$ is invertible and differentiable.

Then the probability density of $x$ satisfies:

$$
p_X(x) =
p_Z(f_\theta(x))
\left|
\det
\frac{\partial f_\theta(x)}{\partial x}
\right|.
$$

genui{"math_block_widget_always_prefetch_v2":{"content":"p_X(x)=p_Z(f_\\theta(x))\\left|\\det\\frac{\\partial f_\\theta(x)}{\\partial x}\\right|"}}

Taking logarithms:

$$
\log p_X(x) =
\log p_Z(z)
+
\log
\left|
\det
\frac{\partial f_\theta(x)}{\partial x}
\right|.
$$

genui{"math_block_widget_always_prefetch_v2":{"content":"\\log p_X(x)=\\log p_Z(z)+\\log\\left|\\det\\frac{\\partial f_\\theta(x)}{\\partial x}\\right|"}}

This equation is central to all flow-based models.

The determinant term measures how the transformation changes local volume in space.

### Jacobian Matrices

The matrix

$$
\frac{\partial f_\theta(x)}{\partial x}
$$

is called the Jacobian matrix.

If:

$$
x \in \mathbb{R}^d,
$$

then the Jacobian is a $d\times d$ matrix:

$$
J_f(x) =
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} &
\cdots &
\frac{\partial f_1}{\partial x_d}
\\
\vdots & \ddots & \vdots
\\
\frac{\partial f_d}{\partial x_1} &
\cdots &
\frac{\partial f_d}{\partial x_d}
\end{bmatrix}.
$$

The determinant of the Jacobian measures local expansion or contraction of volume.

Efficient computation of this determinant is one of the central design constraints in flow models.

### Normalizing Flows

A normalizing flow constructs a complex distribution by composing many simple invertible transformations.

Suppose:

$$
z_0 \sim p_0(z_0)
$$

is a simple base distribution.

Apply a sequence of invertible mappings:

$$
z_1 = f_1(z_0),
$$

$$
z_2 = f_2(z_1),
$$

continuing until:

$$
x = z_K.
$$

The full transformation becomes:

$$
x =
f_K \circ f_{K-1} \circ \cdots \circ f_1(z_0).
$$

The log-density becomes:

$$
\log p(x) =
\log p(z_0) -
\sum_{k=1}^K
\log
\left|
\det
\frac{\partial f_k}{\partial z_{k-1}}
\right|.
$$

genui{"math_block_widget_always_prefetch_v2":{"content":"\\log p(x)=\\log p(z_0)-\\sum_{k=1}^K\\log\\left|\\det\\frac{\\partial f_k}{\\partial z_{k-1}}\\right|"}}

Each layer gradually transforms a simple Gaussian distribution into a complicated data distribution.

### Desirable Properties of Flow Layers

A practical flow transformation should satisfy three properties.

| Property | Importance |
|---|---|
| Invertibility | Enables exact latent inference |
| Efficient Jacobian determinant | Enables tractable likelihood computation |
| Expressiveness | Allows modeling complex distributions |

Balancing these requirements is the main architectural challenge in flow-based modeling.

### NICE

NICE, or Nonlinear Independent Components Estimation, introduced additive coupling layers.

Split the input:

$$
x = (x_1, x_2).
$$

The transformation becomes:

$$
y_1 = x_1,
$$

$$
y_2 = x_2 + m(x_1),
$$

where $m(\cdot)$ is a neural network.

This transformation is invertible:

$$
x_1 = y_1,
$$

$$
x_2 = y_2 - m(y_1).
$$

The Jacobian determinant equals 1, making likelihood computation trivial.

However, additive coupling limits expressiveness.

### RealNVP

RealNVP improved NICE using affine coupling layers.

The transformation becomes:

$$
y_1 = x_1,
$$

$$
y_2 =
x_2 \odot \exp(s(x_1))
+
t(x_1),
$$

where:

| Function | Role |
|---|---|
| $s(x_1)$ | Scale transformation |
| $t(x_1)$ | Translation transformation |

genui{"math_block_widget_always_prefetch_v2":{"content":"y_2=x_2\\odot\\exp(s(x_1))+t(x_1)"}}

The Jacobian determinant becomes easy to compute:

$$
\log |\det J| =
\sum_i s_i(x_1).
$$

Affine coupling layers became one of the most influential flow architectures.

### Glow

Glow introduced several important improvements:

| Innovation | Purpose |
|---|---|
| Invertible $1\times1$ convolutions | Channel mixing |
| ActNorm | Stable normalization |
| Multi-scale architecture | Hierarchical representation |

Glow achieved high-quality image generation while maintaining exact likelihood computation.

The invertible convolution generalizes permutation operations by learning channel mixing directly.

### Invertible Neural Networks

Flow models are examples of invertible neural networks.

Unlike standard feedforward networks:

- every layer must be reversible,
- information cannot be discarded.

This constraint distinguishes flow models from autoencoders or diffusion systems.

An invertible network satisfies:

$$
x = f^{-1}(f(x)).
$$

Invertibility guarantees exact latent recovery.

### Continuous Normalizing Flows

Discrete flow layers can be generalized into continuous dynamics.

A continuous normalizing flow defines:

$$
\frac{dz(t)}{dt} =
f_\theta(z(t), t).
$$

genui{"math_block_widget_always_prefetch_v2":{"content":"\\frac{dz(t)}{dt}=f_\\theta(z(t),t)"}}

The latent state evolves continuously through time.

Probability densities evolve according to:

$$
\frac{d}{dt}\log p(z(t)) = -
\operatorname{Tr}
\left(
\frac{\partial f_\theta}{\partial z}
\right).
$$

Continuous flows connect deep learning with differential equations and dynamical systems.

### Neural ODE Interpretation

Continuous flows are closely related to neural ordinary differential equations.

Instead of discrete layers:

$$
z_{k+1} =
z_k + f_\theta(z_k),
$$

the model evolves continuously:

$$
\frac{dz}{dt} =
f_\theta(z,t).
$$

This perspective links generative modeling with physics-inspired continuous dynamics.

### Flow-Based Sampling

Sampling from a flow model is straightforward.

#### Step 1: Sample Latent Variable

Draw:

$$
z \sim \mathcal{N}(0,I).
$$

#### Step 2: Apply Inverse Transform

Compute:

$$
x = f_\theta^{-1}(z).
$$

Because the mapping is exact and invertible, sampling is efficient.

Unlike diffusion models, no iterative denoising process is required.

### Exact Likelihood Estimation

A major advantage of flow models is exact likelihood evaluation.

Unlike:

- GANs,
- diffusion models,
- energy-based models,

flow models compute normalized likelihoods directly.

This allows:

- principled probabilistic training,
- density estimation,
- anomaly detection,
- calibrated uncertainty estimation.

### Latent Space Structure

The latent space often captures semantic structure.

For example:

- nearby latent vectors generate similar images,
- interpolation between latent vectors produces smooth transitions,
- arithmetic operations may encode semantic relationships.

Example:

$$
z_{\text{smiling}} -
z_{\text{neutral}}
+
z_{\text{female}}
$$

may generate semantically meaningful transformations.

This behavior resembles latent spaces in VAEs and diffusion models.

### Flow Models Versus VAEs

| Flow Models | Variational Autoencoders |
|---|---|
| Exact likelihood | Approximate likelihood |
| Invertible mapping | Stochastic encoder-decoder |
| Exact latent inference | Approximate posterior |
| No information bottleneck | Bottleneck through latent sampling |
| Higher memory cost | Often easier scaling |

Flow models prioritize exact probabilistic modeling. VAEs prioritize flexible approximate inference.

### Flow Models Versus GANs

| Flow Models | GANs |
|---|---|
| Exact likelihood | No explicit likelihood |
| Stable optimization | Adversarial instability |
| Bidirectional mapping | Generator-only mapping |
| Invertible architecture | Flexible architectures |
| Often slower generation quality progress | Historically sharper images |

GANs historically produced sharper images, but flow models provided principled probabilistic learning.

### Flow Models Versus Diffusion Models

| Flow Models | Diffusion Models |
|---|---|
| Exact likelihood | Approximate likelihood |
| One-pass sampling | Iterative denoising |
| Invertible transforms | Stochastic reverse process |
| Fast generation | Slower sampling |
| Architectural constraints | Flexible denoising networks |

Diffusion models currently dominate high-quality image generation, but flows remain attractive for efficient density estimation and invertible representation learning.

### Applications of Flow-Based Models

Flow-based models are used in:

| Application | Purpose |
|---|---|
| Image generation | Exact generative modeling |
| Density estimation | Probabilistic modeling |
| Anomaly detection | Likelihood-based detection |
| Audio synthesis | Waveform generation |
| Scientific simulation | Physical system modeling |
| Bayesian inference | Flexible posterior distributions |

Flows are particularly useful when exact densities matter.

### PyTorch Example

A simple affine coupling layer:

```python id="n2f0sj"
import torch
from torch import nn

class AffineCoupling(nn.Module):
    def __init__(self, dim):
        super().__init__()

        self.scale_net = nn.Sequential(
            nn.Linear(dim // 2, 128),
            nn.ReLU(),
            nn.Linear(128, dim // 2)
        )

        self.translate_net = nn.Sequential(
            nn.Linear(dim // 2, 128),
            nn.ReLU(),
            nn.Linear(128, dim // 2)
        )

    def forward(self, x):
        x1, x2 = x.chunk(2, dim=1)

        s = self.scale_net(x1)
        t = self.translate_net(x1)

        y1 = x1
        y2 = x2 * torch.exp(s) + t

        log_det = s.sum(dim=1)

        return torch.cat([y1, y2], dim=1), log_det
```

The layer computes both:

- transformed outputs,
- Jacobian log-determinants.

Training uses maximum likelihood:

```python id="89p9vg"
z, log_det = flow(x)

log_prob_z = standard_normal.log_prob(z).sum(dim=1)

loss = -(log_prob_z + log_det).mean()
```

### Limitations of Flow-Based Models

Flow models have several important limitations.

#### Invertibility Constraints

Architectures must remain reversible, limiting flexibility.

#### Jacobian Computation

Efficient determinants constrain layer design.

#### Memory Usage

Invertible operations may require large activations.

#### Image Quality

Historically, diffusion models surpassed flows in perceptual quality.

#### Scalability

Very large flow models can become computationally expensive.

### Relationship to Modern AI

Flow-based ideas continue influencing modern systems.

#### Diffusion Models

Some diffusion systems incorporate invertible transformations.

#### Scientific Machine Learning

Flows model physical and probabilistic systems with exact densities.

#### Bayesian Deep Learning

Flows approximate complex posterior distributions.

#### Representation Learning

Invertible architectures support structured latent spaces.

#### Continuous-Time Modeling

Continuous flows connect deep learning with dynamical systems.

### Summary

Flow-based models learn invertible transformations between simple latent distributions and complex data distributions. Using the change-of-variables formula, they compute exact likelihoods and support exact latent inference.

Normalizing flows construct expressive generative models by composing many invertible transformations. Architectures such as NICE, RealNVP, Glow, and continuous normalizing flows introduced efficient Jacobian computation and scalable probabilistic learning.

Flow models provide mathematically elegant generative modeling with exact probabilistic interpretation. Although diffusion models currently dominate many generative tasks, flow-based systems remain important for density estimation, invertible representation learning, Bayesian inference, and continuous probabilistic modeling.