# Understanding Stage2

### Understanding Stage2

When people talk about Zig compiler internals, they often mention `stage2`.

The name can be confusing at first because it sounds like a single feature. It is better to understand it as part of Zig’s compiler history.

Zig started with an older compiler implementation. Over time, the project moved toward a newer compiler written mostly in Zig itself. That newer self-hosted compiler work was commonly called `stage2`.

So, in simple terms:

```text
stage1 = older compiler path
stage2 = newer self-hosted compiler path
```

The goal of `stage2` was not just to rewrite the old compiler line by line. The goal was to build the compiler architecture Zig needed for the long term.

#### Why Stage2 Exists

A programming language compiler is one of the hardest programs to write.

It must parse code, check types, report errors, run compile-time code, generate machine code, link programs, support many platforms, and stay fast enough for daily use.

The early Zig compiler was good enough to grow the language, but Zig needed a stronger foundation.

Stage2 exists because Zig needed:

```text
a compiler written in Zig
better compile-time execution
better error messages
better incremental compilation support
better cross-compilation support
better control over code generation
less dependence on older compiler architecture
```

The most important idea is self-hosting.

A self-hosted compiler is a compiler for a language that is written in that same language.

For Zig, this means:

```text
Zig compiler written in Zig
```

This matters because the compiler itself becomes a large real-world test of the language.

If Zig can implement Zig, then Zig is capable of building large systems software.

#### What “Stage” Means

The word `stage` comes from compiler bootstrapping.

Bootstrapping means building a compiler using an existing compiler.

Imagine you are creating a new language called X.

At first, there is no X compiler written in X. So you might write the first compiler in C.

Then, after the language is strong enough, you write a new compiler in X itself.

The process looks like this:

```text
old compiler builds new compiler
new compiler builds user programs
new compiler eventually builds itself
```

That is why people use words like `stage1`, `stage2`, and sometimes `stage3`.

A simplified view:

```text
stage1 compiler
    ↓ builds
stage2 compiler
    ↓ builds
Zig programs
```

Later, when the new compiler can compile itself reliably, the project can depend less on the older stage.

#### Stage2 and Self-Hosting

Self-hosting is not only symbolic. It has practical value.

When the compiler is written in Zig, compiler developers use Zig every day to build Zig itself.

That creates pressure to improve the language in real ways:

```text
better compile times
better standard library APIs
better memory management patterns
better debugging tools
better build system behavior
better error reporting
```

A language improves differently when its own compiler depends on it.

Tiny annoyances become obvious. Missing features become painful. Slow paths become expensive.

Self-hosting forces the language to face its own design.

#### The Main Compiler Pipeline

Stage2 follows the same broad compiler pipeline you saw earlier:

```text
source code
    ↓
tokenizer
    ↓
parser
    ↓
AST
    ↓
ZIR
    ↓
semantic analysis
    ↓
AIR
    ↓
code generation
    ↓
linking
```

Each stage changes the program into a form that is easier for the compiler to work with.

The source code is for humans.

The AST describes the syntax.

ZIR is a lowered representation of Zig code.

Semantic analysis checks meaning.

AIR represents analyzed code.

Code generation turns the analyzed program into target-specific output.

Linking produces the final artifact.

#### ZIR in Stage2

ZIR means Zig Intermediate Representation.

You can think of ZIR as a simplified internal version of Zig code.

The parser produces an AST. The AST is still shaped like the source file. It remembers many source-level details.

ZIR is lower-level. It is easier for the compiler to analyze.

For example, source code may contain convenient syntax:

```zig
const x = if (flag) 10 else 20;
```

The AST records that this came from an `if` expression.

ZIR represents it in a form that the compiler can process more systematically.

You do not need to read ZIR manually as a beginner. But you should know why it exists.

The compiler does not want to repeatedly reason about every surface syntax detail. It lowers code into a simpler representation, then analyzes that.

#### Semantic Analysis in Stage2

Semantic analysis is one of the largest and most complex parts of the compiler.

It answers questions such as:

```text
What type is this expression?
Does this function call match the function type?
Can this integer fit into the destination type?
Is this value known at compile time?
Does this branch return correctly?
Is this pointer alignment valid?
Can this code be evaluated at comptime?
```

Example:

```zig
const x: u8 = 300;
```

The parser can parse this. The AST is valid.

But semantic analysis rejects it because `300` does not fit in `u8`.

Another example:

```zig
fn add(a: i32, b: i32) i32 {
    return a + b;
}

const x = add("hello", "world");
```

Again, the parser can parse this. The syntax has the right shape.

Semantic analysis rejects it because strings are not `i32` values.

This is why semantic analysis is where many useful compiler errors come from.

#### AIR in Stage2

AIR means Analyzed Intermediate Representation.

By the time code reaches AIR, the compiler knows much more about it.

Types have been resolved. Many compile-time decisions have already happened. Invalid code has been rejected.

A simple mental model:

```text
ZIR = code before full meaning is known
AIR = code after semantic meaning is known
```

AIR is useful because code generation should not need to solve all language-level questions again.

The backend wants a cleaner form:

```text
operations
types
control flow
memory behavior
target requirements
```

AIR helps provide that.

#### Compile-Time Execution

Stage2 must support Zig’s compile-time execution model.

This is a major reason the compiler architecture is complex.

In Zig, the compiler may need to execute real Zig code while compiling.

Example:

```zig
fn makeValue(comptime n: usize) usize {
    return n * 2;
}

const x = makeValue(21);
```

The compiler can compute `x` while compiling the program.

This gets more powerful with types:

```zig
fn Pair(comptime T: type) type {
    return struct {
        first: T,
        second: T,
    };
}

const IntPair = Pair(i32);
```

Here, a function returns a type.

That means the compiler must execute `Pair(i32)` during compilation and create the resulting struct type.

So Stage2 needs more than parsing and type checking. It needs an interpreter for compile-time Zig.

#### Code Generation Backends

After semantic analysis, the compiler must produce target code.

Zig has used LLVM as an important backend. LLVM can optimize code and emit machine code for many architectures.

But Zig has also worked on its own backends.

Why?

Because depending entirely on LLVM has tradeoffs:

```text
LLVM is powerful but large
LLVM increases compiler build complexity
LLVM can slow down simple debug builds
LLVM behavior is not always easy to control
LLVM does not cover every desired compiler use case
```

Zig’s own backend work can help with faster compilation, better control, and simpler bootstrap paths.

A practical view:

```text
LLVM backend = strong optimization and broad target support
Zig native backends = more control and potential speed for some workflows
```

You do not need to choose between them as a beginner. Just know that Stage2 was designed to support a cleaner path from Zig source code to different backend strategies.

#### Incremental Compilation

One long-term goal of the newer compiler architecture is better incremental compilation.

Incremental compilation means the compiler should avoid rebuilding everything when only a small part of the program changes.

For example, if you edit one function, the compiler should ideally reuse previous work for the rest of the program.

That requires careful tracking:

```text
which files changed
which declarations changed
which types depend on which declarations
which compile-time values must be recomputed
which generated code is still valid
```

This is hard in any language.

It is especially hard in Zig because compile-time code can inspect and generate types.

Stage2’s internal design is intended to make this kind of tracking more manageable.

#### Error Messages

A compiler is also a user interface.

When the compiler rejects a program, it must explain why.

Bad error messages make a language painful.

Stage2 work also matters because better internal representations can support better diagnostics.

For example, when the compiler understands the path from source code to semantic failure, it can show:

```text
where the error happened
what type was expected
what type was found
which call caused the problem
which compile-time branch led here
```

Good diagnostics are not added at the end. They depend on architecture.

The compiler must keep enough source location and context information through each internal step.

#### Why Beginners Should Care

You do not need to understand Stage2 deeply to write Zig programs.

But you should know what it means because you will see it in discussions, issues, release notes, and compiler internals.

When someone says:

```text
this changed in stage2
```

they usually mean:

```text
this behavior belongs to the newer self-hosted compiler architecture
```

When someone says:

```text
stage2 caught this differently
```

they may be talking about improved semantic analysis, different diagnostics, or changed compiler behavior.

When someone says:

```text
stage2 backend
```

they may be talking about Zig’s newer code generation path rather than the older LLVM-centered path.

#### A Safe Mental Model

Use this model:

```text
Stage2 is the newer self-hosted Zig compiler architecture.

It parses Zig, lowers it into internal representations, analyzes it, runs compile-time code, generates target code, and links the final output.

Its purpose is to make Zig’s compiler faster, cleaner, more self-reliant, and easier to evolve.
```

That is enough for now.

Later, when you read the compiler source, you can connect the names to files and systems:

```text
AST
ZIR
Sema
AIR
codegen
linker
```

Stage2 is where these pieces come together.

