Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...
| Section | Title |
|---|---|
| 1 | Chapter 17. Numerical and Systems Concerns |
| 2 | Stability of Reverse Mode |
| 3 | Overflow and Underflow |
| 4 | Memory Explosion |
| 5 | Gradient Vanishing and Explosion |
| 6 | Determinism and Reproducibility |
| 7 | Parallelism |
| 8 | GPU and TPU Execution |
| 9 | Distributed Gradient Computation |
Chapter 17. Numerical and Systems ConcernsAutomatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...
Stability of Reverse ModeReverse mode automatic differentiation computes gradients by propagating adjoint values backward through a computational graph. In exact arithmetic, the reverse accumulation...
Overflow and UnderflowFloating point systems represent numbers within a finite range. When a computed value exceeds the largest representable magnitude, overflow occurs. When a value becomes too...
Memory ExplosionReverse-mode automatic differentiation trades computation for memory. To compute gradients efficiently, the backward pass requires access to intermediate values produced...
Gradient Vanishing and ExplosionGradient-based optimization relies on propagating derivative information through many layers, time steps, or computational transformations. In deep systems, these gradients...
Determinism and ReproducibilityAutomatic differentiation systems are often assumed to be deterministic. Given identical inputs, identical parameters, and identical code, many users expect identical...
ParallelismAutomatic differentiation is usually described as a transformation of programs or computational graphs. In real systems, it is also a parallel execution problem. Large...
GPU and TPU ExecutionModern automatic differentiation systems are built around accelerator hardware. GPUs and TPUs provide enormous throughput for tensor operations, making large-scale...
Distributed Gradient ComputationDistributed gradient computation appears when a differentiable program no longer fits comfortably on one device or one machine. The reason may be model size, data volume,...