Chapter 21 – brain

Sections

Transformer EncodersA transformer encoder is a neural network block that maps a sequence of input vectors to a sequence of contextualized output vectors.

Transformer DecodersA transformer decoder is a neural network block that maps a prefix sequence to a sequence of next-token representations. It is used when the model must generate output one step at a time.

Positional EncodingSelf-attention compares tokens to other tokens, but by itself it has no built-in notion of order.

Residual and Normalization LayersTransformer layers are deep stacks of attention and feedforward blocks.

Scaling TransformersScaling a transformer means increasing its capacity, data exposure, context length, training compute, or serving throughput.

Efficient TransformersStandard transformer attention scales quadratically with sequence length. For a sequence of length $T$, self-attention constructs a score matrix of size

Sparse Expert ArchitecturesDense transformers activate every parameter for every token.