# Jacobian-Vector Products

## Jacobian-Vector Products

The natural output of forward mode automatic differentiation is a Jacobian-vector product. Instead of constructing the full Jacobian matrix explicitly, forward mode computes how a perturbation vector propagates through a function.

For a function

$$
f : \mathbb{R}^n \to \mathbb{R}^m,
$$

the Jacobian at $x$ is

$$
J_f(x) =
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix}.
$$

Given a direction vector

$$
v \in \mathbb{R}^n,
$$

forward mode computes

$$
J_f(x)v.
$$

This product is called a Jacobian-vector product, usually abbreviated JVP.

### Geometric interpretation

A differentiable function locally behaves like a linear map. Around a point $x$,

$$
f(x + \Delta x)
\approx
f(x) + J_f(x)\Delta x.
$$

If we perturb the input in direction $v$,

$$
x \mapsto x + \epsilon v,
$$

then the first-order output perturbation is

$$
f(x + \epsilon v) =
f(x) + \epsilon J_f(x)v.
$$

So the JVP tells us how infinitesimal motion in input space transforms into infinitesimal motion in output space.

Forward mode computes exactly this transformed direction.

### Tangent propagation produces JVPs

Suppose the inputs are seeded with tangents:

$$
x_i \mapsto (x_i, v_i).
$$

Forward propagation computes tangent values for all intermediate variables. The final output tangent is

$$
\dot{y} = J_f(x)v.
$$

The tangent vector is therefore the directional derivative of the function in direction $v$.

This is why forward mode is sometimes described as directional differentiation.

### Example: scalar output

Consider

$$
f(x,y) = x^2y + \sin y.
$$

The Jacobian is

$$
J_f(x,y) =
\begin{bmatrix}
2xy & x^2 + \cos y
\end{bmatrix}.
$$

Choose direction

$$
v =
\begin{bmatrix}
v_x \\
v_y
\end{bmatrix}.
$$

Then

$$
J_f(x,y)v =
2xyv_x + (x^2 + \cos y)v_y.
$$

Now compute the same result using forward mode.

Seed:

$$
\dot{x} = v_x,
\qquad
\dot{y} = v_y.
$$

Evaluate:

$$
a = x^2,
\qquad
\dot{a} = 2xv_x.
$$

$$
b = ay,
\qquad
\dot{b} = \dot{a}y + a\dot{y}.
$$

Substitute:

$$
\dot{b} = 2xyv_x + x^2v_y.
$$

Next:

$$
c = \sin y,
\qquad
\dot{c} = \cos y \, v_y.
$$

Finally:

$$
f = b + c,
\qquad
\dot{f} = \dot{b} + \dot{c}.
$$

So

$$
\dot{f} =
2xyv_x + (x^2 + \cos y)v_y.
$$

This equals the Jacobian-vector product.

### Example: vector output

Now consider

$$
f(x,y) =
\begin{bmatrix}
xy \\
x+y \\
\sin x
\end{bmatrix}.
$$

The Jacobian is

$$
J_f(x,y) =
\begin{bmatrix}
y & x \\
1 & 1 \\
\cos x & 0
\end{bmatrix}.
$$

For direction

$$
v =
\begin{bmatrix}
v_x \\
v_y
\end{bmatrix},
$$

the JVP is

$$
J_f(x,y)v =
\begin{bmatrix}
yv_x + xv_y \\
v_x + v_y \\
\cos x \, v_x
\end{bmatrix}.
$$

Forward mode computes this directly.

Seed:

$$
\dot{x} = v_x,
\qquad
\dot{y} = v_y.
$$

Then:

$$
\dot{f}_1 = yv_x + xv_y,
$$

$$
\dot{f}_2 = v_x + v_y,
$$

$$
\dot{f}_3 = \cos x \, v_x.
$$

The output tangent vector is exactly the JVP.

### JVPs without explicit Jacobians

The important point is that forward mode never forms the Jacobian matrix explicitly.

For a large system, the Jacobian may be enormous. Suppose

$$
f : \mathbb{R}^{10^6} \to \mathbb{R}^{10^6}.
$$

The full Jacobian contains $10^{12}$ entries. Explicit storage is often impossible.

Forward mode avoids this cost. It computes

$$
J_f(x)v
$$

directly by propagating one tangent vector through the computation graph.

This is especially valuable when:

1. Only directional derivatives are needed.
2. The Jacobian is sparse or implicit.
3. Forming the full matrix would be too expensive.

### Computational complexity

Suppose the primal function evaluation costs $C$.

A forward-mode JVP typically costs approximately

$$
O(C)
$$

up to a small constant factor.

The tangent computation follows the same graph as the primal computation. Each primitive performs some extra local derivative work, but the asymptotic complexity is usually unchanged.

Computing the full Jacobian is more expensive.

For

$$
f : \mathbb{R}^n \to \mathbb{R}^m,
$$

one forward pass computes one JVP. To recover the full Jacobian, we usually evaluate:

$$
J_f(x)e_1,
\quad
J_f(x)e_2,
\quad
\ldots,
\quad
J_f(x)e_n,
$$

where $e_i$ are standard basis vectors.

Thus full Jacobian construction requires approximately $n$ forward passes.

Forward mode is therefore efficient when:

$$
n \ll m
$$

or when only a few directional derivatives are required.

### Matrix view of tangent propagation

Each intermediate variable has a tangent:

$$
\dot{v}.
$$

If the primitive operation is

$$
z = \phi(x_1,\ldots,x_k),
$$

then

$$
\dot{z} =
\sum_i
\frac{\partial \phi}{\partial x_i}
\dot{x}_i.
$$

This is exactly multiplication by the local Jacobian of the primitive.

The entire computation graph therefore performs repeated local matrix-vector multiplications:

$$
v
\mapsto
J_1v
\mapsto
J_2J_1v
\mapsto
\cdots
\mapsto
J_fv.
$$

Forward mode composes these local linear maps incrementally during execution.

### Relation to the chain rule

Suppose

$$
f(x) = h(g(x)).
$$

Then

$$
J_f(x) =
J_h(g(x))J_g(x).
$$

Apply this Jacobian to a vector $v$:

$$
J_f(x)v =
J_h(g(x))(J_g(x)v).
$$

Forward mode computes exactly this sequence:

1. Push $v$ through $g$.
2. Push the resulting tangent through $h$.

The tangent vector flows forward through the composed computation.

This is the operational form of the chain rule.

### Basis seeding

To compute a specific partial derivative, choose a basis direction.

For

$$
f : \mathbb{R}^3 \to \mathbb{R},
$$

suppose we want

$$
\frac{\partial f}{\partial x_2}.
$$

Use seed:

$$
v =
\begin{bmatrix}
0 \\
1 \\
0
\end{bmatrix}.
$$

Then

$$
J_f(x)v =
\frac{\partial f}{\partial x_2}.
$$

More generally:

| Seed vector | Result |
|---|---|
| $e_1$ | First Jacobian column |
| $e_2$ | Second Jacobian column |
| $e_i$ | $i$-th Jacobian column |
| arbitrary $v$ | directional derivative |

Thus the seed determines the derivative query.

### Multiple directions simultaneously

Forward mode can propagate several tangent directions at once.

Instead of scalar tangents,

$$
\dot{x}_i \in \mathbb{R},
$$

use tangent matrices:

$$
\dot{x}_i \in \mathbb{R}^k.
$$

Each variable now carries $k$ tangent components.

The output becomes

$$
J_f(x)V,
$$

where

$$
V \in \mathbb{R}^{n \times k}.
$$

This computes $k$ JVPs simultaneously.

If

$$
V = I_n,
$$

the identity matrix, then

$$
J_f(x)V = J_f(x),
$$

so the full Jacobian is recovered in one vectorized pass. However, this may require large tangent storage and substantial arithmetic overhead.

### JVPs in machine learning

Modern machine learning systems frequently use JVPs.

Applications include:

| Application | Use of JVP |
|---|---|
| Sensitivity analysis | perturbation propagation |
| Meta-learning | differentiating parameter updates |
| Implicit layers | linearized solver differentiation |
| Neural ODEs | tangent dynamics |
| Hessian-vector products | nested differentiation |
| Second-order optimization | curvature approximations |
| Physics simulation | variational equations |

Many algorithms only require products with derivatives, not explicit derivative matrices.

This distinction is fundamental in large-scale systems.

### JVP versus VJP

Forward mode computes

$$
Jv.
$$

Reverse mode computes

$$
J^\top v.
$$

The reverse-mode product is called a vector-Jacobian product (VJP) or adjoint product.

The two have complementary complexity profiles:

| Mode | Natural product | Efficient when |
|---|---|---|
| Forward mode | $Jv$ | few inputs |
| Reverse mode | $J^\top v$ | few outputs |

For scalar-output functions,

$$
f : \mathbb{R}^n \to \mathbb{R},
$$

reverse mode computes the full gradient in one pass, while forward mode needs $n$ passes.

For scalar-input functions,

$$
f : \mathbb{R} \to \mathbb{R}^m,
$$

forward mode computes the full derivative vector in one pass.

### Linearization viewpoint

A JVP can also be viewed as evaluation of the linearized function.

Define the linearization of $f$ at $x$:

$$
L_x(v) = J_f(x)v.
$$

Forward mode computes

$$
L_x(v)
$$

without materializing $L_x$ as a matrix.

In many systems, the linearized operator is more important than the Jacobian itself. Optimization methods, Krylov solvers, Newton methods, and sensitivity analysis often only require repeated applications of the linearized operator.

Forward mode naturally exposes this operator form.

### Sparse directional propagation

If the seed vector $v$ is sparse, tangent propagation only activates dependent computations.

For example, if

$$
v_i = 0
$$

for most components, many tangent computations remain zero.

This property is useful for:

- sparse Jacobian estimation,
- localized sensitivity analysis,
- block-structured systems,
- PDE discretizations,
- graph-based models.

Efficient sparse forward-mode systems exploit this structure to reduce arithmetic and memory cost.

### Summary

Forward mode automatic differentiation naturally computes Jacobian-vector products:

$$
J_f(x)v.
$$

A tangent seed vector defines an infinitesimal perturbation direction. Tangent propagation pushes this perturbation through the computation graph using local derivative rules. The resulting output tangent is the directional derivative of the function.

The key property is that forward mode computes JVPs directly, without explicitly forming Jacobian matrices. This makes it effective for directional sensitivity analysis, sparse systems, higher-order methods, and problems where the number of input directions is small.

