Skip to content

Chapter 17. Numerical and Systems Concerns

Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...

SectionTitle
1Chapter 17. Numerical and Systems Concerns
2Stability of Reverse Mode
3Overflow and Underflow
4Memory Explosion
5Gradient Vanishing and Explosion
6Determinism and Reproducibility
7Parallelism
8GPU and TPU Execution
9Distributed Gradient Computation
Chapter 17. Numerical and Systems ConcernsAutomatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...
8 min
Stability of Reverse ModeReverse mode automatic differentiation computes gradients by propagating adjoint values backward through a computational graph. In exact arithmetic, the reverse accumulation...
7 min
Overflow and UnderflowFloating point systems represent numbers within a finite range. When a computed value exceeds the largest representable magnitude, overflow occurs. When a value becomes too...
7 min
Memory ExplosionReverse-mode automatic differentiation trades computation for memory. To compute gradients efficiently, the backward pass requires access to intermediate values produced...
8 min
Gradient Vanishing and ExplosionGradient-based optimization relies on propagating derivative information through many layers, time steps, or computational transformations. In deep systems, these gradients...
7 min
Determinism and ReproducibilityAutomatic differentiation systems are often assumed to be deterministic. Given identical inputs, identical parameters, and identical code, many users expect identical...
7 min
ParallelismAutomatic differentiation is usually described as a transformation of programs or computational graphs. In real systems, it is also a parallel execution problem. Large...
7 min
GPU and TPU ExecutionModern automatic differentiation systems are built around accelerator hardware. GPUs and TPUs provide enormous throughput for tensor operations, making large-scale...
7 min
Distributed Gradient ComputationDistributed gradient computation appears when a differentiable program no longer fits comfortably on one device or one machine. The reason may be model size, data volume,...
8 min