Parameter InitializationA neural network begins training with parameters that have not yet been learned from data.
Vanishing and Exploding GradientsDeep networks train by sending information in two directions.
Batch NormalizationBatch normalization is a layer that normalizes activations using statistics computed from a mini-batch.
Layer NormalizationLayer normalization is a normalization method that normalizes features within each individual example.
Group and Instance NormalizationBatch normalization and layer normalization are the two most common normalization layers, but they do not cover every setting well.
Residual ConnectionsResidual connections allow a layer or block to add its input directly to its output. Instead of forcing a block to learn a complete transformation from scratch, the block learns a correction to the input.
Stable Training in Deep NetworksStable training means that a model can make steady progress without numerical collapse, uncontrolled gradients, or large oscillations in the loss.