Chapter 15 – brain

Sections

Attention MechanismsAttention is a method for letting a model choose which parts of an input are most relevant when producing an output.

Self-AttentionSelf-attention is attention applied within a single sequence. The same input supplies the queries, keys, and values. Each position builds a new representation by reading from other positions in the same sequence.

Multi-Head AttentionMulti-head attention runs several attention operations in parallel.

Positional EncodingSelf-attention compares tokens by content. By itself, it has no built-in notion of token order.

Transformer EncodersA transformer encoder is a stack of layers that maps an input sequence to a contextual sequence representation.

Transformer DecodersA transformer decoder maps a partial output sequence to predictions for the next token or next output step.

Efficient Attention MethodsStandard self-attention compares every token with every other token. For a sequence of length $T$, this produces a $T \times T$ attention matrix. The cost grows quadratically with sequence length.