# Understanding Zig IR

### Understanding Zig IR

IR means intermediate representation.

An intermediate representation is the compiler’s internal form of a program. It sits between source code and final machine code.

A compiler does not usually translate source text directly into CPU instructions in one step. It lowers the program through several forms.

```text
source code
tokens
AST
semantic analysis
IR
machine code
```

Each form has a different job.

Source code is for humans.

The AST keeps the structure of the written program.

IR is for the compiler.

Machine code is for the processor.

#### Why IR Exists

Source code is too rich for later compiler stages.

Consider this Zig code:

```zig
pub fn add(a: i32, b: i32) i32 {
    return a + b;
}
```

Humans see a function with two parameters and one return statement.

A compiler needs a simpler form:

```text
function add
parameter a: i32
parameter b: i32
tmp0 = add a, b
return tmp0
```

That simpler form is easier to analyze, optimize, and lower to machine code.

IR removes surface syntax and keeps meaning.

#### AST vs IR

The AST follows the shape of the source code.

For this expression:

```zig
1 + 2 * 3
```

the AST may look like:

```text
+
  1
  *
    2
    3
```

That tree preserves the expression structure.

IR may look flatter:

```text
tmp0 = 2 * 3
tmp1 = 1 + tmp0
return tmp1
```

Both represent the same meaning, but they are useful for different jobs.

The AST is good for parsing and error messages.

IR is good for analysis and code generation.

#### Lowering

Lowering means converting a high-level form into a simpler lower-level form.

A compiler may lower:

```text
source syntax -> AST
AST -> semantic IR
semantic IR -> backend IR
backend IR -> machine code
```

At each step, the program becomes less like what the user wrote and more like what the machine can execute.

For example:

```zig
if (x > 0) {
    return x;
} else {
    return -x;
}
```

may lower into basic blocks:

```text
entry:
    tmp0 = x > 0
    branch tmp0, then_block, else_block

then_block:
    return x

else_block:
    tmp1 = negate x
    return tmp1
```

This form makes control flow explicit.

#### Basic Blocks

A basic block is a straight-line sequence of instructions.

Inside one basic block, control enters at the top and leaves at the bottom.

There are no jumps into the middle.

There are no branches out of the middle.

Example:

```text
entry:
    tmp0 = load x
    tmp1 = tmp0 + 1
    store y, tmp1
    branch done
```

The final instruction usually controls where execution goes next:

```text
return
branch
conditional branch
unreachable
```

Basic blocks are useful because control flow becomes a graph.

#### Control Flow Graph

A control flow graph shows how basic blocks connect.

Example:

```text
entry -> then_block
entry -> else_block
then_block -> done
else_block -> done
```

This matters for:

```text
type checking
definite assignment
liveness analysis
optimization
code generation
```

When a compiler understands control flow, it can answer questions like:

```text
Can this code run?
Is this variable initialized here?
Does every path return a value?
Can this branch be removed?
```

#### Values and Instructions

In an IR, instructions often produce values.

Example:

```text
tmp0 = add a, b
tmp1 = mul tmp0, 10
return tmp1
```

Here:

```text
tmp0
tmp1
```

are temporary values created by IR instructions.

This makes dependencies visible.

The second instruction depends on the first one. The return depends on the second one.

That structure is easier for the compiler to reason about than raw source text.

#### Types in IR

Zig is strongly typed, so its IR must carry type information.

Example:

```text
tmp0: i32 = add a: i32, b: i32
```

The compiler needs to know:

```text
the type of each value
the type of each instruction result
the type expected by each operation
the type returned by each function
```

This is especially important in Zig because types can be compile-time values.

A Zig compiler does not only compile values. It also reasons about types as part of compilation.

#### Compile-Time IR

Zig has `comptime`, so the compiler must represent code that may run during compilation.

Example:

```zig
fn makeArray(comptime n: usize) type {
    return [n]u8;
}
```

Here, `n` is known at compile time, and the function returns a type.

The compiler must evaluate this kind of code while compiling.

That means Zig IR must support compile-time execution as well as runtime code generation.

This is one of the reasons Zig’s compiler architecture is more subtle than a simple C-like compiler.

#### Semantic IR

A useful way to think about Zig IR is semantic IR.

Semantic IR represents the meaning of the program after name lookup and type analysis.

At this point, the compiler has answered questions such as:

```text
What declaration does this name refer to?
What type does this expression have?
Is this operation allowed?
Is this value comptime-known?
Does this function call need specialization?
```

The IR is no longer just syntax. It is typed, checked program meaning.

#### Backend IR

After semantic analysis, the compiler may lower the program further for code generation.

A backend IR is closer to machine code.

It cares about things like:

```text
registers
stack slots
calling conventions
memory layout
branches
loads and stores
target CPU features
```

The semantic IR says what the program means.

The backend IR says how to implement it on a target machine.

#### Why Multiple IRs Exist

One IR rarely serves every purpose well.

A high-level IR is good for language rules.

A low-level IR is good for machine code.

A compiler may use several internal forms because each one makes a certain task easier.

Think of it like maps.

A city subway map is useful for train routes. A street map is useful for walking. A topographic map is useful for terrain. They all describe the same city, but each one is shaped for a different job.

Compiler IR works the same way.

#### IR and Optimization

Optimization means changing the program representation while preserving behavior.

Example source:

```zig
const x = 1 + 2;
```

The compiler may replace this with:

```zig
const x = 3;
```

At the IR level:

```text
tmp0 = add 1, 2
```

can become:

```text
tmp0 = 3
```

This is called constant folding.

Other optimizations include:

```text
dead code elimination
inlining
common subexpression elimination
bounds check removal
loop optimization
branch simplification
```

The compiler performs these transformations more easily on IR than on source code.

#### IR and Error Messages

Even though IR is internal, it still affects error messages.

When the compiler reports an error, it should point back to the source code.

That means IR often carries source location information.

Example:

```text
instruction: add
source span: line 10, column 17 to line 10, column 22
```

Without source locations, the compiler may know something is wrong but fail to explain where it came from.

Good IR design preserves enough source context for useful diagnostics.

#### IR and Debugging the Compiler

If you work on a compiler, you need to inspect IR.

A compiler may provide commands or debug flags to dump internal representation.

A human-readable IR dump might look like:

```text
fn add(a: i32, b: i32) i32:
entry:
    %0 = add i32 %a, %b
    ret %0
```

This helps answer:

```text
Did parsing work?
Did semantic analysis choose the right types?
Did lowering preserve the program meaning?
Did optimization remove too much?
Did code generation receive valid input?
```

When a compiler bug happens, IR dumps are often the fastest way to locate the broken stage.

#### IR and Target Independence

One major benefit of IR is target independence.

The frontend can lower Zig source into IR once.

Then different backends can lower that IR to different targets.

```text
Zig source
    -> Zig IR
        -> x86_64 machine code
        -> ARM machine code
        -> WebAssembly
        -> object file
```

This separation keeps the compiler organized.

The frontend should not need to know every detail of every CPU.

The backend should not need to parse Zig syntax.

#### IR and ABI Rules

Eventually, IR must respect the target ABI.

ABI means application binary interface. It defines low-level rules such as:

```text
how function arguments are passed
which registers are used
how structs are returned
how stack alignment works
how symbols are named
how object files are linked
```

For example, passing a struct to a function may use different rules on different targets.

The high-level IR may simply say:

```text
call foo(value)
```

The backend must lower that call according to the target ABI.

This is one reason code generation is target-specific.

#### IR and Memory

Low-level IR must make memory operations explicit.

Source code:

```zig
x = x + 1;
```

may become:

```text
tmp0 = load x
tmp1 = add tmp0, 1
store x, tmp1
```

The compiler now sees:

```text
read memory
compute value
write memory
```

This helps with optimization and correctness.

Memory is one of the hardest parts of compiler design because pointers, aliasing, volatile access, and atomics all affect what transformations are legal.

#### Volatile and Atomic Operations

IR must preserve special memory rules.

A volatile hardware register access cannot be optimized away.

An atomic operation must keep its ordering guarantees.

Example Zig idea:

```zig
const ptr: *volatile u32 = ...;
ptr.* = 1;
```

The IR must remember that this store is volatile.

If it becomes an ordinary store, the compiler may generate incorrect embedded code.

This is why IR needs more than simple arithmetic instructions. It must encode language semantics precisely.

#### IR and Safety Checks

Zig has runtime safety checks in safe modes.

Examples:

```text
integer overflow checks
array bounds checks
null checks
invalid enum value checks
unreachable checks
```

The compiler may insert these checks during lowering.

Example:

```zig
const x = array[i];
```

may lower into:

```text
if i >= array.len:
    panic bounds error
tmp0 = load array[i]
```

In release modes, some checks may be removed depending on build configuration.

The IR must support both checked and unchecked forms.

#### Reading Zig IR as a Learner

You do not need to understand every compiler-internal detail to write Zig programs.

But understanding IR helps you reason about what the compiler does.

When you write:

```zig
const y = x + 1;
```

the compiler thinks in smaller operations:

```text
read x
add 1
produce y
```

When you write:

```zig
if (condition) a() else b()
```

the compiler thinks in blocks and branches.

When you write generic `comptime` code, the compiler may generate specialized IR for the types you pass.

This mental model helps you write clearer, faster Zig code.

#### A Simple Toy IR

Here is a small IR for arithmetic expressions:

```zig
const IrInst = union(enum) {
    constant_i64: i64,
    add: struct {
        left: usize,
        right: usize,
    },
    multiply: struct {
        left: usize,
        right: usize,
    },
    return_value: usize,
};
```

Each instruction is stored in an array.

The `usize` fields refer to previous instruction indexes.

Example:

```text
0: constant_i64 1
1: constant_i64 2
2: constant_i64 3
3: multiply 1, 2
4: add 0, 3
5: return_value 4
```

This represents:

```text
1 + 2 * 3
```

The AST was tree-shaped. This IR is list-shaped.

That is a common lowering step.

#### The Main Idea

IR is the compiler’s working form of a program.

It is lower-level than source code, richer than machine code, and shaped for analysis, optimization, and code generation.

For Zig, IR also has to support compile-time execution, strong typing, target-specific lowering, safety checks, volatile and atomic memory rules, and precise diagnostics.

To understand a compiler, learn its IR. The IR shows what the compiler thinks the program means.

