Code Generation

Code generation is the compiler stage that turns analyzed program meaning into target code.

At this point, the compiler has already parsed the source code and checked that the program is valid. It knows the types, declarations, function calls, control flow, error paths, comptime results, and target platform.

Now it must produce something the machine can use.

A rough pipeline looks like this:

Zig source code
    ↓
AST
    ↓
semantic analysis
    ↓
AIR
    ↓
code generation
    ↓
object file or machine code

Code generation is where Zig moves from language rules to real execution.

What the Backend Receives

The backend should not receive raw source code.

Source code is for humans. It contains names, formatting, high-level expressions, and convenient syntax.

The backend needs a lower-level representation.

For example, you may write:

fn add(a: i32, b: i32) i32 {
    return a + b;
}

After parsing and semantic analysis, the compiler knows:

add is a function
a is an i32
b is an i32
the result is an i32
+ means integer addition
return sends the result to the caller

The backend receives this resolved meaning and turns it into target instructions.

On one machine, the final instruction might use an x86-64 register.

On another machine, it might use an ARM register.

The Zig source code is the same, but the generated code depends on the target.

Targets

A target describes where the program will run.

A target usually includes:

CPU architecture
operating system
ABI
object format

Examples:

x86_64-linux-gnu
aarch64-macos
x86_64-windows
wasm32-wasi

Code generation must respect the target.

The same Zig code may need different output for Linux, Windows, macOS, embedded systems, or WebAssembly.

This affects many details:

register names
calling conventions
integer sizes
pointer sizes
object file format
symbol names
linker behavior
system libraries

That is why code generation is target-specific.

Machine Code

Machine code is code the CPU can execute directly.

A CPU does not understand Zig source code.

It understands instructions such as:

load this value
add these registers
compare these values
jump to this address
call this function
return to the caller

A very simplified function like this:

fn add(a: i32, b: i32) i32 {
    return a + b;
}

may become something like:

move first argument into a register
add second argument
place result in return register
return

The exact instructions depend on the target.

You do not need to read assembly to write Zig, but understanding that this conversion happens helps you understand why target information matters.

Object Files

The compiler often does not produce a complete executable immediately.

It may produce an object file.

An object file contains compiled code, symbols, relocation information, and metadata needed by the linker.

Example command:

zig build-obj main.zig

This can produce an object file instead of a full executable.

A simplified view:

source file
    ↓
object file
    ↓
linker
    ↓
executable

Object files are useful because large programs are often compiled in pieces, then linked together.

Symbols

A symbol is a named item in compiled output.

Examples:

function name
global variable name
exported declaration
external C function

If you export a function from Zig:

export fn add(a: i32, b: i32) i32 {
    return a + b;
}

the generated object file needs a symbol for add.

That symbol lets other code find and call the function.

Symbols matter for linking, dynamic libraries, C interop, and debugging.

Calling Conventions

A calling convention defines how functions call each other at the machine level.

It answers questions like:

Where do arguments go?
Which registers are used?
Where does the return value go?
Who cleans up the stack?
Which registers must be preserved?

This matters when Zig calls C or C calls Zig.

Example:

extern fn puts(s: [*:0]const u8) c_int;

This declares a C function. Zig must call it using the correct C ABI for the target platform.

If the calling convention is wrong, the program may crash or silently corrupt data.

Code generation must follow these rules exactly.

Layout of Data

Code generation also depends on how data is laid out in memory.

Consider:

const Point = struct {
    x: i32,
    y: i32,
};

The compiler must decide where each field lives in memory.

A simple layout might be:

offset 0: x
offset 4: y

But layout can be affected by alignment, packing, target rules, and ABI requirements.

For normal Zig structs, the compiler has freedom to choose layout unless you request a specific layout. For C-compatible layout, you use extern struct.

Example:

const CPoint = extern struct {
    x: i32,
    y: i32,
};

This tells Zig to use a layout compatible with C.

Code generation must use the correct offsets when reading or writing fields.

Alignment

Alignment means certain values must be stored at memory addresses that are multiples of a given number.

For example, a 4-byte integer may need to be placed at an address divisible by 4.

Alignment helps the CPU access memory efficiently. Some targets may also require alignment for correctness.

Code generation must know alignment when emitting loads and stores.

Example:

const x: u32 = 123;

The compiler tracks the alignment of x. Later, the backend uses that information when generating memory access instructions.

Control Flow

Code generation must turn high-level control flow into jumps and branches.

Example:

fn choose(flag: bool) i32 {
    if (flag) {
        return 10;
    } else {
        return 20;
    }
}

The backend may generate logic like:

check flag
if false, jump to else block
return 10
else block:
return 20

A loop:

while (i < 10) : (i += 1) {
    sum += i;
}

becomes a structure with labels and jumps:

loop start:
check condition
if false, jump to loop end
run body
run continue expression
jump to loop start
loop end:

The source code looks structured. The machine code is closer to jumps between blocks.

Error Handling Code

Zig errors are part of normal control flow.

Example:

fn readNumber() !u32 {
    return error.NotFound;
}

fn main2() !void {
    const n = try readNumber();
    _ = n;
}

The try expression is lowered into branching logic.

A simplified view:

call readNumber
if result is error:
    return that error
else:
    unwrap success value

There are no hidden exceptions. Code generation emits normal control flow for errors.

This is one reason Zig error handling is predictable.

Optional Values

Optionals also affect code generation.

Example:

const maybe_number: ?u32 = null;

The compiler must represent both states:

has value
has no value

For some types, the compiler can optimize the representation.

For example, an optional pointer can often use null as the “no value” state:

const ptr: ?*u8 = null;

There is no need for a separate boolean in many cases. The null pointer itself can represent absence.

Code generation uses type information to choose the correct representation.

Runtime Safety Checks

In safe build modes, Zig emits runtime safety checks.

Example:

const value = items[index];

If the compiler cannot prove that index is valid, safe modes may include a bounds check.

A simplified version:

if index >= items.len:
    panic
load items[index]

In optimized release modes, some safety checks may be disabled depending on the mode.

This means code generation also depends on optimization mode.

Common build modes include debug and release modes. Debug builds favor safety and debuggability. Release builds favor performance or small size.

Optimization

Optimization improves generated code.

Examples of optimizations include:

remove unused code
inline small functions
simplify constant expressions
remove unnecessary loads
combine instructions
avoid repeated calculations

For example:

const x = 10 + 20;

can be computed at compile time as 30.

A function call may be inlined:

fn square(x: i32) i32 {
    return x * x;
}

fn f() i32 {
    return square(5);
}

The compiler may turn this into:

return 25

Optimization is powerful, but it must preserve the meaning of the program.

LLVM Backend

Zig has used LLVM as an important backend.

LLVM is a compiler infrastructure that can optimize and generate code for many targets.

A simplified path with LLVM looks like this:

Zig analyzed code
    ↓
LLVM IR
    ↓
LLVM optimization
    ↓
machine code

LLVM is useful because it supports many architectures and has strong optimization passes.

The tradeoff is that LLVM is large and complex. It can also make the compiler heavier.

Zig Native Backends

Zig has also developed native backend work.

A native backend is code generation implemented directly by Zig’s compiler, instead of relying fully on LLVM.

Native backends can help with:

faster debug compilation
smaller compiler bootstrap paths
better control over emitted code
less dependency on LLVM
experimentation with compiler internals

For highly optimized release builds, LLVM may still be important. For fast debug builds, native backends can be valuable.

The broad architecture allows more than one backend strategy.

Debug Information

Code generation can also emit debug information.

Debug information connects machine code back to source code.

This is what lets a debugger show:

current source line
function names
local variables
call stack
types

Without debug information, a debugger may only show raw addresses and assembly.

In debug builds, the compiler tries to preserve enough information for debugging. In optimized builds, some variables may be removed, inlined, or rearranged, so debugging can become harder.

Code Generation and Linking Are Connected

Code generation produces object code, but object code still needs to connect with other code.

For example:

const std = @import("std");

pub fn main() void {
    std.debug.print("hello\n", .{});
}

Your main function may need code from the standard library, operating system calls, formatting functions, and startup logic.

The backend emits references. The linker resolves them.

So code generation and linking are separate, but closely related.

Why Code Generation Is Hard

Code generation is difficult because it must be both correct and efficient.

It must handle:

many CPU architectures
many operating systems
different ABIs
different object formats
debug information
optimization modes
runtime safety checks
C interop
inline assembly
atomics
vectors
packed data
alignment
linker requirements

A small language feature can affect many backend details.

For example, pointers affect memory access, alignment, aliasing, calling conventions, optional representation, and debug info.

That is why compiler backends are large systems.

The Beginner Mental Model

Use this model:

Semantic analysis proves the program makes sense.
Code generation turns that meaning into target code.

When Zig compiles your program, it does not simply translate text line by line.

It resolves the program, lowers it into internal forms, chooses target-specific representations, emits instructions or object code, and prepares the result for linking.

Code generation is the bridge between Zig as a language and the machine that will run your program.