# Instruction Tuning

Pretraining teaches a language model to predict text. It does not directly teach the model to follow user instructions, answer safely, maintain dialogue structure, or format outputs in a useful way.

A pretrained model may continue text well but still behave poorly in interactive settings. For example, it may ignore instructions, generate irrelevant continuations, produce unsafe content, or imitate undesirable patterns from the training corpus.

Instruction tuning adapts a pretrained language model into a system that responds to tasks expressed in natural language.

The core idea is simple: instead of training on generic text continuation, we train the model on pairs of instructions and desired responses.

A typical example looks like:

| Instruction | Response |
|---|---|
| “Translate this sentence into French.” | Correct French translation |
| “Summarize the following article.” | Summary |
| “Write a Python function for binary search.” | Python code |
| “Explain gradient descent.” | Educational explanation |

Instruction tuning changes the model’s behavior from generic next-token continuation toward task-oriented response generation.

### From Language Modeling to Task Following

A pretrained autoregressive model learns

$$
p_\theta(x_t \mid x_{<t}).
$$

The model predicts the next token from previous tokens. During pretraining, the corpus may contain instructions, answers, conversations, code, essays, and many unrelated text types mixed together.

Instruction tuning reorganizes the training distribution. Instead of arbitrary web text, the model receives structured examples:

$$
(\text{instruction}, \text{response}).
$$

The model then learns the conditional distribution

$$
p_\theta(\text{response} \mid \text{instruction}).
$$

This appears superficially similar to pretraining, since the model still predicts tokens autoregressively. The difference is the structure of the data distribution.

Pretraining teaches language structure broadly. Instruction tuning teaches cooperative task behavior.

### Supervised Fine-Tuning

Instruction tuning is usually implemented as supervised fine-tuning, often abbreviated SFT.

The dataset contains demonstrations written by humans, synthetic systems, or mixtures of both. Each example includes:

| Field | Purpose |
|---|---|
| System prompt | Defines global behavior |
| User prompt | Contains the instruction |
| Assistant response | Desired output |

A training sample may look like:

```text id="y47k8m"
<system>
You are a helpful assistant.

<user>
Explain backpropagation in simple terms.

<assistant>
Backpropagation computes gradients by applying the chain rule ...
```

The model is trained to predict the assistant tokens conditioned on all previous tokens.

The supervised loss is standard cross-entropy:

$$
\mathcal{L} =
-\sum_{t}
\log p_\theta(y_t \mid x, y_{<t}),
$$

where:

| Symbol | Meaning |
|---|---|
| $x$ | Prompt or instruction |
| $y_t$ | Target response token |
| $y_{<t}$ | Previous response tokens |

Only assistant tokens usually contribute to the loss. User and system tokens provide conditioning context but are not prediction targets.

### Why Instruction Tuning Works

Instruction tuning works because pretrained models already contain broad latent capabilities. Pretraining exposes the model to many tasks indirectly through text. The model may already contain useful representations for translation, reasoning, summarization, coding, and dialogue.

Instruction tuning teaches the model when and how to use those capabilities.

This is often described as eliciting latent knowledge rather than creating entirely new knowledge.

The model learns patterns such as:

| Behavior | Example |
|---|---|
| Obeying instructions | Following formatting requests |
| Maintaining dialogue roles | Responding as assistant rather than continuing user text |
| Producing concise answers | Avoiding irrelevant continuation |
| Refusing unsafe requests | Safety alignment |
| Using chain-of-thought style reasoning | Stepwise solutions |
| Formatting outputs | Markdown, JSON, code blocks |

A relatively small instruction dataset can significantly change model behavior because the pretrained model already contains strong language representations.

### Prompt Formatting and Chat Templates

Modern instruction-tuned models usually rely on structured prompt templates.

A dialogue is converted into a token sequence with role markers:

```text id="hhd1x5"
<system>
You are a concise assistant.

<user>
What is overfitting?

<assistant>
```

The model generates the assistant continuation.

Different model families use different formatting conventions:

| Model family | Example format |
|---|---|
| ChatML-style | `<system>`, `<user>`, `<assistant>` |
| Instruction-style | `### Instruction:` |
| Llama-style chat | `[INST] ... [/INST]` |
| XML-style | `<instruction>` tags |
| JSON-style | Structured objects |

The formatting matters because the model learns statistical associations between role markers and behavior.

Changing the template can affect performance substantially.

### Multi-Task Instruction Tuning

Instruction datasets often combine many tasks:

| Task type | Example |
|---|---|
| Question answering | Factual responses |
| Summarization | Compress documents |
| Translation | Convert languages |
| Coding | Generate programs |
| Classification | Assign labels |
| Dialogue | Multi-turn interaction |
| Reasoning | Solve structured problems |
| Tool use | Call APIs or functions |

The model learns a unified interface: natural language instructions.

Instead of separate models for each task, one instruction-tuned model learns many conditional behaviors.

This unification is one reason large language models are flexible. The instruction itself acts as part of the program specification.

### Zero-Shot and Few-Shot Generalization

Instruction tuning improves zero-shot generalization. A zero-shot task is one where the model receives only the instruction, without examples.

Example:

```text id="j6r4wa"
Classify this review as positive or negative:
"The battery life is excellent."
```

The model may perform the task correctly even without task-specific training examples in the prompt.

Few-shot prompting provides demonstrations inside the prompt itself:

```text id="s3j4x8"
Input: "Amazing product."
Label: Positive

Input: "Very disappointing."
Label: Negative

Input: "Battery life is excellent."
Label:
```

Instruction tuning improves the model’s ability to interpret such prompts consistently.

Pretraining alone may give weak task following. Instruction tuning calibrates the model toward cooperative interaction.

### Chain-of-Thought Supervision

Some instruction datasets include intermediate reasoning steps rather than only final answers.

Example:

```text id="0ahp2o"
Question: If a train travels 60 km in 2 hours, what is its average speed?

Reasoning:
Speed = distance / time
= 60 / 2
= 30 km/h

Answer: 30 km/h
```

Training on reasoning traces can improve performance on multi-step reasoning tasks.

The model learns statistical patterns associated with decomposition, intermediate computation, verification, and explanation.

This is called chain-of-thought supervision.

However, chain-of-thought introduces several concerns:

| Concern | Description |
|---|---|
| Verbosity | Longer outputs increase cost |
| Faithfulness | Reasoning text may not reflect internal computation |
| Data contamination | Public reasoning datasets may leak benchmarks |
| Safety | Hidden reasoning may expose unsafe internal content |

Some systems therefore separate visible reasoning from internal latent reasoning.

### Instruction Diversity

An instruction-tuned model must generalize across many instruction styles.

If the dataset is too narrow, the model may overfit to specific phrasing. High-quality instruction tuning datasets therefore vary:

| Variation | Example |
|---|---|
| Wording | “Summarize” versus “Give a short overview” |
| Tone | Formal versus conversational |
| Format | JSON, markdown, prose |
| Difficulty | Simple and complex tasks |
| Domain | Science, law, code, dialogue |
| Language | Multilingual prompts |

Diversity improves robustness.

The model learns abstract task semantics rather than memorizing exact templates.

### Synthetic Instruction Data

Human-written instruction datasets are expensive. Many modern systems therefore generate synthetic instruction data.

A strong model can generate:

| Synthetic component | Example |
|---|---|
| Instructions | “Write a SQL query for…” |
| Responses | High-quality completions |
| Reasoning traces | Stepwise derivations |
| Critiques | Error analysis |
| Preference labels | Ranking candidate answers |

Synthetic data generation creates a recursive training loop:

1. Train a strong model.
2. Use the model to generate instruction data.
3. Filter or rank the outputs.
4. Train a new model on the expanded dataset.

This process scales data generation beyond purely human annotation.

However, synthetic data can amplify errors, stylistic artifacts, and model biases. Filtering and evaluation become increasingly important.

### Catastrophic Forgetting

Instruction tuning changes the model distribution. If done poorly, it can damage capabilities learned during pretraining.

This is called catastrophic forgetting.

Possible symptoms include:

| Problem | Example |
|---|---|
| Reduced factual recall | Worse knowledge retrieval |
| Lower language diversity | Repetitive responses |
| Reduced multilingual ability | Strong English bias |
| Style collapse | Overly uniform outputs |
| Short-answer bias | Failure on long reasoning tasks |

Instruction tuning datasets are much smaller than pretraining corpora. Aggressive fine-tuning can therefore distort the pretrained representation space.

Several techniques reduce forgetting:

| Technique | Purpose |
|---|---|
| Small learning rates | Preserve pretrained features |
| Mixed training data | Blend instruction and pretraining text |
| Parameter-efficient tuning | Update fewer parameters |
| Regularization | Prevent large parameter drift |
| Replay buffers | Reintroduce older data |

Balancing specialization and preservation is a major practical challenge.

### Parameter-Efficient Instruction Tuning

Full fine-tuning updates all parameters. For large models, this is expensive.

Parameter-efficient fine-tuning updates only small subsets of parameters.

Common approaches include:

| Method | Idea |
|---|---|
| LoRA | Low-rank weight updates |
| Adapters | Small trainable modules inserted into layers |
| Prefix tuning | Train virtual prompt vectors |
| Prompt tuning | Learn soft prompts |
| BitFit | Train only bias terms |

For example, LoRA approximates weight updates using low-rank matrices:

$$
\Delta W = AB,
$$

where $A$ and $B$ have much smaller rank than $W$.

This greatly reduces memory and compute requirements while preserving much of the model’s performance.

Parameter-efficient tuning is widely used for domain adaptation and open-source fine-tuning.

### Instruction Tuning and Alignment

Instruction tuning improves usability, but it does not fully solve alignment.

A model may still:

| Failure mode | Example |
|---|---|
| Hallucinate | Invent facts |
| Follow harmful requests | Unsafe outputs |
| Over-refuse | Reject harmless queries |
| Manipulate users | Social persuasion |
| Leak training data | Memorized content |
| Produce biased responses | Social stereotypes |

Instruction tuning mainly teaches behavioral imitation from demonstrations.

More advanced alignment methods, such as reinforcement learning from human feedback, constitutional training, and preference optimization, further shape the model’s behavior.

### PyTorch View of Supervised Fine-Tuning

Suppose a tokenized batch has shape:

```python id="j5o5wm"
[B, T]
```

where:

| Symbol | Meaning |
|---|---|
| `B` | Batch size |
| `T` | Sequence length |

The model produces logits:

```python id="fcw7ah"
[B, T, V]
```

where $V$ is the vocabulary size.

Instruction tuning usually masks non-assistant tokens from the loss.

Example:

```python id="78r88m"
import torch
import torch.nn.functional as F

# input_ids: [B, T]
# labels: assistant tokens kept, others set to -100

logits = model(input_ids)

loss = F.cross_entropy(
    logits.view(-1, logits.size(-1)),
    labels.view(-1),
    ignore_index=-100
)
```

The label tensor may look like:

```text id="6kkr7s"
Input tokens:   [SYSTEM USER USER ASSISTANT ASSISTANT]
Loss mask:      [  -100  -100  -100     y1        y2 ]
```

Only assistant outputs contribute gradients.

### Data Mixture and Curriculum

Instruction tuning datasets are often mixtures of many sources:

| Source | Example |
|---|---|
| Human annotation | Expert-written prompts |
| Public QA datasets | Reading comprehension |
| Code datasets | Programming tasks |
| Synthetic conversations | Generated dialogues |
| Tool traces | API interaction examples |
| Reasoning datasets | Math and logic problems |

The mixture ratio matters.

Too much conversational data may weaken reasoning. Too much code data may distort natural language style. Too much synthetic data may create repetitive outputs.

Some training pipelines also use curricula:

1. Easier tasks first.
2. More complex reasoning later.
3. Specialized tasks near the end.

Curriculum design can improve stability and convergence.

### Why Instruction Tuning Changed Language Models

Early large language models were often difficult to control. Users needed carefully engineered prompts to obtain reliable behavior.

Instruction tuning changed the interaction model. Instead of treating the model as a generic text completer, users could treat it as a cooperative assistant.

This shift enabled:

| Capability | Impact |
|---|---|
| Conversational systems | Multi-turn dialogue |
| General-purpose assistants | Broad task coverage |
| Tool integration | API and retrieval systems |
| Coding assistants | Natural language programming |
| Educational tutors | Explanatory interaction |
| Agent systems | Planning and execution loops |

Instruction tuning therefore transformed pretrained language models into usable interactive systems.

### Summary

Instruction tuning adapts pretrained language models for task-following behavior using supervised examples of instructions and desired responses.

The model learns conditional generation:

$$
p_\theta(\text{response} \mid \text{instruction}).
$$

Instruction tuning improves usability, formatting, dialogue structure, reasoning style, and zero-shot task generalization.

Modern instruction-tuned systems rely on structured prompts, diverse datasets, chain-of-thought supervision, synthetic data generation, and parameter-efficient adaptation methods.

Instruction tuning greatly improves interaction quality, but it does not fully solve factuality, robustness, or safety. Later alignment stages further shape model behavior beyond supervised imitation.

