# Instruction Tuning Pretraining teaches a language model to predict text. It does not directly teach the model to follow user instructions, answer safely, maintain dialogue structure, or format outputs in a useful way. A pretrained model may continue text well but still behave poorly in interactive settings. For example, it may ignore instructions, generate irrelevant continuations, produce unsafe content, or imitate undesirable patterns from the training corpus. Instruction tuning adapts a pretrained language model into a system that responds to tasks expressed in natural language. The core idea is simple: instead of training on generic text continuation, we train the model on pairs of instructions and desired responses. A typical example looks like: | Instruction | Response | |---|---| | “Translate this sentence into French.” | Correct French translation | | “Summarize the following article.” | Summary | | “Write a Python function for binary search.” | Python code | | “Explain gradient descent.” | Educational explanation | Instruction tuning changes the model’s behavior from generic next-token continuation toward task-oriented response generation. ### From Language Modeling to Task Following A pretrained autoregressive model learns $$ p_\theta(x_t \mid x_{ You are a helpful assistant. Explain backpropagation in simple terms. Backpropagation computes gradients by applying the chain rule ... ``` The model is trained to predict the assistant tokens conditioned on all previous tokens. The supervised loss is standard cross-entropy: $$ \mathcal{L} = -\sum_{t} \log p_\theta(y_t \mid x, y_{ You are a concise assistant. What is overfitting? ``` The model generates the assistant continuation. Different model families use different formatting conventions: | Model family | Example format | |---|---| | ChatML-style | ``, ``, `` | | Instruction-style | `### Instruction:` | | Llama-style chat | `[INST] ... [/INST]` | | XML-style | `` tags | | JSON-style | Structured objects | The formatting matters because the model learns statistical associations between role markers and behavior. Changing the template can affect performance substantially. ### Multi-Task Instruction Tuning Instruction datasets often combine many tasks: | Task type | Example | |---|---| | Question answering | Factual responses | | Summarization | Compress documents | | Translation | Convert languages | | Coding | Generate programs | | Classification | Assign labels | | Dialogue | Multi-turn interaction | | Reasoning | Solve structured problems | | Tool use | Call APIs or functions | The model learns a unified interface: natural language instructions. Instead of separate models for each task, one instruction-tuned model learns many conditional behaviors. This unification is one reason large language models are flexible. The instruction itself acts as part of the program specification. ### Zero-Shot and Few-Shot Generalization Instruction tuning improves zero-shot generalization. A zero-shot task is one where the model receives only the instruction, without examples. Example: ```text id="j6r4wa" Classify this review as positive or negative: "The battery life is excellent." ``` The model may perform the task correctly even without task-specific training examples in the prompt. Few-shot prompting provides demonstrations inside the prompt itself: ```text id="s3j4x8" Input: "Amazing product." Label: Positive Input: "Very disappointing." Label: Negative Input: "Battery life is excellent." Label: ``` Instruction tuning improves the model’s ability to interpret such prompts consistently. Pretraining alone may give weak task following. Instruction tuning calibrates the model toward cooperative interaction. ### Chain-of-Thought Supervision Some instruction datasets include intermediate reasoning steps rather than only final answers. Example: ```text id="0ahp2o" Question: If a train travels 60 km in 2 hours, what is its average speed? Reasoning: Speed = distance / time = 60 / 2 = 30 km/h Answer: 30 km/h ``` Training on reasoning traces can improve performance on multi-step reasoning tasks. The model learns statistical patterns associated with decomposition, intermediate computation, verification, and explanation. This is called chain-of-thought supervision. However, chain-of-thought introduces several concerns: | Concern | Description | |---|---| | Verbosity | Longer outputs increase cost | | Faithfulness | Reasoning text may not reflect internal computation | | Data contamination | Public reasoning datasets may leak benchmarks | | Safety | Hidden reasoning may expose unsafe internal content | Some systems therefore separate visible reasoning from internal latent reasoning. ### Instruction Diversity An instruction-tuned model must generalize across many instruction styles. If the dataset is too narrow, the model may overfit to specific phrasing. High-quality instruction tuning datasets therefore vary: | Variation | Example | |---|---| | Wording | “Summarize” versus “Give a short overview” | | Tone | Formal versus conversational | | Format | JSON, markdown, prose | | Difficulty | Simple and complex tasks | | Domain | Science, law, code, dialogue | | Language | Multilingual prompts | Diversity improves robustness. The model learns abstract task semantics rather than memorizing exact templates. ### Synthetic Instruction Data Human-written instruction datasets are expensive. Many modern systems therefore generate synthetic instruction data. A strong model can generate: | Synthetic component | Example | |---|---| | Instructions | “Write a SQL query for…” | | Responses | High-quality completions | | Reasoning traces | Stepwise derivations | | Critiques | Error analysis | | Preference labels | Ranking candidate answers | Synthetic data generation creates a recursive training loop: 1. Train a strong model. 2. Use the model to generate instruction data. 3. Filter or rank the outputs. 4. Train a new model on the expanded dataset. This process scales data generation beyond purely human annotation. However, synthetic data can amplify errors, stylistic artifacts, and model biases. Filtering and evaluation become increasingly important. ### Catastrophic Forgetting Instruction tuning changes the model distribution. If done poorly, it can damage capabilities learned during pretraining. This is called catastrophic forgetting. Possible symptoms include: | Problem | Example | |---|---| | Reduced factual recall | Worse knowledge retrieval | | Lower language diversity | Repetitive responses | | Reduced multilingual ability | Strong English bias | | Style collapse | Overly uniform outputs | | Short-answer bias | Failure on long reasoning tasks | Instruction tuning datasets are much smaller than pretraining corpora. Aggressive fine-tuning can therefore distort the pretrained representation space. Several techniques reduce forgetting: | Technique | Purpose | |---|---| | Small learning rates | Preserve pretrained features | | Mixed training data | Blend instruction and pretraining text | | Parameter-efficient tuning | Update fewer parameters | | Regularization | Prevent large parameter drift | | Replay buffers | Reintroduce older data | Balancing specialization and preservation is a major practical challenge. ### Parameter-Efficient Instruction Tuning Full fine-tuning updates all parameters. For large models, this is expensive. Parameter-efficient fine-tuning updates only small subsets of parameters. Common approaches include: | Method | Idea | |---|---| | LoRA | Low-rank weight updates | | Adapters | Small trainable modules inserted into layers | | Prefix tuning | Train virtual prompt vectors | | Prompt tuning | Learn soft prompts | | BitFit | Train only bias terms | For example, LoRA approximates weight updates using low-rank matrices: $$ \Delta W = AB, $$ where $A$ and $B$ have much smaller rank than $W$. This greatly reduces memory and compute requirements while preserving much of the model’s performance. Parameter-efficient tuning is widely used for domain adaptation and open-source fine-tuning. ### Instruction Tuning and Alignment Instruction tuning improves usability, but it does not fully solve alignment. A model may still: | Failure mode | Example | |---|---| | Hallucinate | Invent facts | | Follow harmful requests | Unsafe outputs | | Over-refuse | Reject harmless queries | | Manipulate users | Social persuasion | | Leak training data | Memorized content | | Produce biased responses | Social stereotypes | Instruction tuning mainly teaches behavioral imitation from demonstrations. More advanced alignment methods, such as reinforcement learning from human feedback, constitutional training, and preference optimization, further shape the model’s behavior. ### PyTorch View of Supervised Fine-Tuning Suppose a tokenized batch has shape: ```python id="j5o5wm" [B, T] ``` where: | Symbol | Meaning | |---|---| | `B` | Batch size | | `T` | Sequence length | The model produces logits: ```python id="fcw7ah" [B, T, V] ``` where $V$ is the vocabulary size. Instruction tuning usually masks non-assistant tokens from the loss. Example: ```python id="78r88m" import torch import torch.nn.functional as F # input_ids: [B, T] # labels: assistant tokens kept, others set to -100 logits = model(input_ids) loss = F.cross_entropy( logits.view(-1, logits.size(-1)), labels.view(-1), ignore_index=-100 ) ``` The label tensor may look like: ```text id="6kkr7s" Input tokens: [SYSTEM USER USER ASSISTANT ASSISTANT] Loss mask: [ -100 -100 -100 y1 y2 ] ``` Only assistant outputs contribute gradients. ### Data Mixture and Curriculum Instruction tuning datasets are often mixtures of many sources: | Source | Example | |---|---| | Human annotation | Expert-written prompts | | Public QA datasets | Reading comprehension | | Code datasets | Programming tasks | | Synthetic conversations | Generated dialogues | | Tool traces | API interaction examples | | Reasoning datasets | Math and logic problems | The mixture ratio matters. Too much conversational data may weaken reasoning. Too much code data may distort natural language style. Too much synthetic data may create repetitive outputs. Some training pipelines also use curricula: 1. Easier tasks first. 2. More complex reasoning later. 3. Specialized tasks near the end. Curriculum design can improve stability and convergence. ### Why Instruction Tuning Changed Language Models Early large language models were often difficult to control. Users needed carefully engineered prompts to obtain reliable behavior. Instruction tuning changed the interaction model. Instead of treating the model as a generic text completer, users could treat it as a cooperative assistant. This shift enabled: | Capability | Impact | |---|---| | Conversational systems | Multi-turn dialogue | | General-purpose assistants | Broad task coverage | | Tool integration | API and retrieval systems | | Coding assistants | Natural language programming | | Educational tutors | Explanatory interaction | | Agent systems | Planning and execution loops | Instruction tuning therefore transformed pretrained language models into usable interactive systems. ### Summary Instruction tuning adapts pretrained language models for task-following behavior using supervised examples of instructions and desired responses. The model learns conditional generation: $$ p_\theta(\text{response} \mid \text{instruction}). $$ Instruction tuning improves usability, formatting, dialogue structure, reasoning style, and zero-shot task generalization. Modern instruction-tuned systems rely on structured prompts, diverse datasets, chain-of-thought supervision, synthetic data generation, and parameter-efficient adaptation methods. Instruction tuning greatly improves interaction quality, but it does not fully solve factuality, robustness, or safety. Later alignment stages further shape model behavior beyond supervised imitation.