Skip to content

Chapter 13. Optimization and Machine Learning

Gradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...

SectionTitle
1Chapter 13. Optimization and Machine Learning
2Stochastic Optimization
3Backpropagation
4Neural Network Training
5Sequence Models
6Attention Mechanisms
7Implicit Layers
8Meta-Learning
9Reinforcement Learning
10Physics-Informed Models
Chapter 13. Optimization and Machine LearningGradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...
8 min
Stochastic OptimizationStochastic optimization studies optimization when the objective is accessed through samples, noisy estimates, or partial observations. In machine learning, this is the normal...
7 min
BackpropagationBackpropagation is reverse mode automatic differentiation applied to neural networks. In most machine learning writing, the term refers to the whole training procedure: run a...
7 min
Neural Network TrainingNeural network training is the repeated application of three operations: evaluate a model, differentiate a scalar loss, and update parameters. Automatic differentiation...
8 min
Sequence ModelsSequence models process ordered data. The input is not one independent vector, but a series:
5 min
Attention MechanismsAttention is a sequence operation that lets each position read information from other positions. Instead of compressing the whole past into one recurrent hidden state,...
8 min
Implicit LayersAn implicit layer defines its output as the solution of an equation, not as a fixed sequence of explicit operations. Instead of computing
7 min
Meta-LearningMeta-learning studies systems that improve how they learn. Instead of only optimizing model parameters for one task, a meta-learning method optimizes some part of the learning...
7 min
Reinforcement LearningReinforcement learning studies learning systems that act in an environment. Unlike supervised learning, the training signal is not a target label for each input. The model...
7 min
Physics-Informed ModelsPhysics-informed models combine data fitting with equations from physics or applied mathematics. The model is trained not only to match observed samples, but also to satisfy...
7 min