# Self-Hosted Zig Compiler

### Self-Hosted Zig Compiler

A self-hosted compiler is a compiler written in the language it compiles.

For Zig, this means the Zig compiler is written largely in Zig and can compile Zig programs, including parts of itself.

This matters because a compiler is not only a tool for users. It is also one of the largest tests of the language. If Zig can implement its own compiler, then Zig is strong enough to build large, complex, performance-sensitive software.

#### What Self-Hosted Means

A normal compiler has two languages involved:

```text
implementation language -> target language
```

For example:

```text
C++ compiler written in C++
Go compiler written in Go
Rust compiler written in Rust
Zig compiler written in Zig
```

When the implementation language and the target language are the same, the compiler is self-hosted.

This creates a cycle:

```text
old compiler builds new compiler
new compiler builds user programs
new compiler can later build the next compiler
```

This cycle is called bootstrapping.

#### Why Self-Hosting Matters

Self-hosting gives a language several benefits.

It proves the language can handle real systems programming. A compiler needs parsing, type checking, memory management, data structures, error reporting, code generation, caching, file handling, and careful performance work.

It also improves the language ecosystem. Compiler developers use the same language features as normal users, so weak parts of the language become visible quickly.

A self-hosted compiler also reduces dependence on another language. If the compiler is mostly written in Zig, then Zig developers can work on Zig using Zig.

#### The Bootstrap Problem

There is one obvious problem:

```text
How do you compile the Zig compiler before you already have a Zig compiler?
```

The answer is bootstrapping.

At first, a language usually needs a compiler written in another language. Later, once the language is strong enough, developers write a new compiler in the language itself.

The old compiler builds the new compiler. Then the new compiler can build future versions.

Conceptually:

```text
stage 0 compiler -> builds stage 1 compiler
stage 1 compiler -> builds stage 2 compiler
stage 2 compiler -> builds user programs
```

A compiler project may compare stage 1 and stage 2 outputs to check correctness.

#### Frontend and Backend

A compiler usually has a frontend and a backend.

The frontend understands the source language.

It handles:

```text
tokenizing
parsing
name lookup
type checking
semantic analysis
compile-time execution
error messages
```

The backend produces lower-level output.

It handles:

```text
intermediate representation
optimization
machine code generation
object files
debug information
linking support
```

In Zig, the self-hosted compiler frontend is especially important because Zig has strong compile-time execution. The compiler must be able to evaluate Zig code while compiling Zig code.

#### Parsing Zig

The first stage is parsing.

The compiler reads source text:

```zig
const x: u32 = 42;
```

and turns it into syntax structure.

The parser does not fully understand the program yet. It mostly understands grammar.

It knows that this is a variable declaration. It knows the name is `x`. It knows the type annotation is `u32`. It knows the initializer is `42`.

Parsing answers:

```text
What is written here?
```

It does not fully answer:

```text
Is this program valid?
What does this type mean?
Can this value be known at compile time?
```

Those questions belong to semantic analysis.

#### Semantic Analysis

Semantic analysis gives meaning to syntax.

It checks things like:

```text
Does this name exist?
Is this assignment allowed?
Does this expression have the expected type?
Can this function return this value?
Is this comptime expression valid?
```

Example:

```zig
const x: u32 = "hello";
```

The parser can parse this. The syntax is valid.

Semantic analysis rejects it because `"hello"` is not a `u32`.

This is where the compiler becomes a language implementation, not just a syntax reader.

#### Compile-Time Execution

Zig has `comptime`, so the compiler must execute some Zig code during compilation.

Example:

```zig
fn add(comptime T: type, a: T, b: T) T {
    return a + b;
}
```

The compiler must understand `T` as a compile-time value. It must instantiate the function for the requested type. It must evaluate compile-time branches and loops.

This makes the compiler more powerful, but also more difficult to implement.

A Zig compiler must be both:

```text
a translator
a compile-time interpreter
```

That is one reason self-hosting is a serious test of the language.

#### Intermediate Representation

After semantic analysis, the compiler usually lowers the program into an intermediate representation.

An intermediate representation, or IR, is a simpler internal language used by the compiler.

Source Zig may be rich:

```zig
if (x > 10) {
    return x + 1;
} else {
    return 0;
}
```

The compiler may lower it into simpler blocks, branches, and operations.

IR helps the compiler reason about code, optimize it, and send it to different backends.

A good IR is easier for the compiler to analyze than raw source syntax.

#### Code Generation

Code generation turns the compiler’s internal representation into executable output.

Possible outputs include:

```text
machine code
object files
assembly
LLVM IR
WebAssembly
C source
```

A compiler can support multiple backends.

One backend might use LLVM. Another might generate machine code directly. Another might target WebAssembly.

The frontend asks:

```text
What does the Zig program mean?
```

The backend asks:

```text
How do we produce runnable code for this target?
```

#### Why Zig Cares About Cross Compilation

Zig is designed as a cross-compilation toolchain.

That means the compiler should be able to build programs for many targets from one host machine.

For example:

```text
host: x86_64 Linux
target: aarch64 macOS
target: x86_64 Windows
target: riscv64 freestanding
target: wasm32
```

This matters deeply for the compiler architecture.

The compiler needs target descriptions:

```text
pointer size
integer ABI rules
calling conventions
object file format
linking rules
CPU features
operating system ABI
```

A self-hosted compiler must make these rules explicit in its own code.

#### Error Messages

Compiler quality is not only about accepting correct programs.

It is also about rejecting wrong programs clearly.

A good compiler error should explain:

```text
where the problem is
what rule was broken
what type was expected
what type was found
what the programmer can check next
```

For example:

```text
expected type 'u32', found '*const [5:0]u8'
```

A compiler needs source locations, spans, notes, and sometimes related locations.

This requires careful data design. Error reporting is not an afterthought. It is part of the compiler core.

#### Incremental Work

A self-hosted compiler is too large to build in one step.

Compiler developers usually build it piece by piece:

```text
parse small programs
analyze declarations
support basic expressions
support functions
support control flow
support structs and enums
support pointers and slices
support comptime
support standard library
support code generation
support linking
```

Each feature depends on earlier features.

A compiler grows like a city: many small systems must connect cleanly.

#### Testing a Compiler

A compiler needs many kinds of tests.

Parser tests check syntax.

Semantic tests check type rules.

Compile-error tests check that invalid programs fail correctly.

Runtime tests compile and run programs.

Code generation tests inspect output.

Cross-target tests check different platforms.

Self-hosting adds another major test:

```text
Can the compiler build itself?
```

That test is expensive, but powerful. It exercises huge parts of the language and compiler.

#### Stage Comparison

A self-hosted compiler may be built in stages.

```text
stage1 compiler builds stage2 compiler
stage2 compiler builds stage3 compiler
```

If stage2 and stage3 are built from the same source using equivalent compiler logic, their outputs should match or behave the same.

This is a way to detect compiler bugs.

The exact process differs between compiler projects, but the principle is stable: the compiler must prove that it can reproduce itself reliably.

#### Memory Management Inside the Compiler

A compiler allocates many short-lived objects:

```text
tokens
AST nodes
type objects
IR instructions
symbol table entries
error messages
temporary analysis data
```

Zig’s allocator model is useful here.

Different compiler data has different lifetimes:

```text
per file
per module
per declaration
per function body
per compilation
temporary scratch
```

A compiler can use arenas for data that dies together.

For example, AST nodes for one parsed file can live in a file arena. Temporary analysis buffers can live in a scratch allocator and be reset often.

Clear allocation strategy keeps the compiler faster and easier to debug.

#### The Compiler as a Zig Program

A self-hosted compiler is also a large Zig codebase.

That means it benefits from Zig’s normal strengths:

```text
explicit allocation
strong type checking
comptime
tagged unions
error unions
slices
packed data structures
cross compilation
```

But it also exposes Zig’s weaknesses.

If a language feature makes the compiler hard to write, compiler developers feel that pain directly. That feedback can improve the language.

#### Why This Matters to Learners

You do not need to understand the whole Zig compiler to learn Zig.

But studying the compiler teaches important habits:

```text
separate syntax from meaning
keep data ownership clear
test every stage
make errors precise
avoid hidden runtime behavior
design around targets explicitly
```

These habits apply to many Zig programs, not just compilers.

#### The Main Idea

A self-hosted compiler is a compiler written in the language it compiles.

For Zig, self-hosting is more than a milestone. It is a stress test of the language, the standard library, the build system, and the compiler architecture.

The compiler must parse Zig, analyze Zig, execute comptime Zig, generate code for many targets, report precise errors, and eventually build itself again.

