# LLVM Integration

### LLVM Integration

LLVM is a compiler infrastructure project.

A compiler infrastructure is a collection of reusable compiler parts. Instead of every language building every backend from scratch, a language can use LLVM for optimization and machine code generation.

Many languages use LLVM because it already knows how to target many CPUs and operating systems.

Zig has used LLVM as an important backend. This means Zig can lower analyzed Zig code into LLVM’s internal representation, let LLVM optimize it, and ask LLVM to produce machine code.

A simplified path looks like this:

```text
Zig source code
    ↓
Zig parser
    ↓
semantic analysis
    ↓
Zig internal representation
    ↓
LLVM IR
    ↓
LLVM optimization
    ↓
machine code
```

LLVM sits near the end of the compiler pipeline. It does not decide what Zig syntax means. Zig’s own compiler frontend does that.

#### What LLVM Does

LLVM helps with backend work.

It can:

```text
represent low-level program operations
optimize code
allocate registers
select machine instructions
emit object files
support many CPU architectures
support debugging metadata
```

For example, Zig may understand this function:

```zig
fn add(a: i32, b: i32) i32 {
    return a + b;
}
```

After Zig has checked the function, it can lower the operation into LLVM IR. LLVM then turns that lower-level form into target machine instructions.

The final output depends on the selected target.

On x86-64, LLVM emits x86-64 instructions.

On AArch64, LLVM emits AArch64 instructions.

The Zig function is the same. The generated machine code is different.

#### What LLVM Does Not Do

LLVM does not understand Zig source code directly.

It does not parse Zig. It does not enforce Zig’s error handling rules. It does not decide how `comptime` works. It does not resolve Zig imports. It does not know Zig’s surface syntax.

Those are Zig compiler frontend responsibilities.

Use this division:

```text
Zig frontend:
    parsing
    name resolution
    type checking
    comptime evaluation
    semantic analysis
    Zig-specific diagnostics

LLVM backend:
    low-level optimization
    instruction selection
    register allocation
    machine code emission
```

This separation matters. If a Zig program has a type error, LLVM is usually not involved yet. The Zig compiler rejects the program before code generation reaches LLVM.

#### LLVM IR

LLVM IR means LLVM Intermediate Representation.

It is a low-level program representation used by LLVM.

It is higher-level than raw assembly, but lower-level than Zig source code.

For example, Zig source code may contain structs, slices, error unions, optionals, generic functions, and compile-time code. LLVM IR does not preserve all of that in the same form.

By the time code reaches LLVM IR, many Zig-level decisions have already been made.

A rough lowering path:

```text
Zig function
    ↓
Zig semantic analysis
    ↓
AIR
    ↓
LLVM IR
    ↓
machine code
```

LLVM IR is useful because LLVM optimization passes know how to work on it.

#### Optimization Passes

An optimization pass is a compiler step that improves code while preserving behavior.

Examples:

```text
remove unused calculations
inline functions
simplify constant expressions
combine instructions
remove unreachable blocks
move repeated work out of loops
improve memory access patterns
```

Suppose the source code contains:

```zig
fn f() i32 {
    return 10 + 20;
}
```

The compiler does not need to generate runtime instructions to add `10` and `20`. It can return `30`.

Another example:

```zig
fn square(x: i32) i32 {
    return x * x;
}

fn g() i32 {
    return square(5);
}
```

An optimizer may inline `square(5)` and reduce the result to `25`.

LLVM has many mature optimization passes. This is one of the main reasons languages use it.

#### Target Support

LLVM supports many architectures.

Examples include:

```text
x86-64
AArch64
ARM
RISC-V
WebAssembly
PowerPC
```

This is valuable for Zig because Zig treats cross-compilation as a normal workflow.

When you compile for a target, Zig can use LLVM’s knowledge of that target.

For example:

```bash
zig build-exe main.zig -target x86_64-linux
zig build-exe main.zig -target aarch64-macos
zig build-exe main.zig -target wasm32-wasi
```

The same Zig source can produce different output for different environments.

LLVM helps with the low-level target-specific parts.

#### Register Allocation

CPUs have a limited number of registers.

A register is a very fast storage location inside the CPU.

Code generation must decide which values live in registers and which values must be stored in memory.

This is called register allocation.

Example:

```zig
fn calc(a: i32, b: i32, c: i32) i32 {
    return (a + b) * c;
}
```

The compiler needs temporary storage for `a + b` before multiplying by `c`.

LLVM can choose registers and instructions for the target CPU.

This is harder than it sounds because real functions may have many variables, branches, loops, calls, and temporaries.

#### Instruction Selection

Instruction selection means choosing actual CPU instructions for lower-level operations.

A generic operation like:

```text
integer addition
```

must become a real target instruction.

On one CPU, the instruction may be named one way. On another CPU, it may be different. Some CPUs have special instructions for certain patterns.

LLVM contains target descriptions and instruction selection logic for many CPUs.

This saves Zig from implementing every backend detail separately for every target.

#### Debug Information

LLVM can also help emit debug information.

Debug information connects machine code back to source code.

It lets debuggers show:

```text
file names
line numbers
function names
local variables
stack frames
types
```

When you build in debug mode, Zig can provide information to LLVM so the final object file contains useful debug metadata.

This is what makes source-level debugging possible.

#### Why Zig Still Needs Its Own Backend Work

If LLVM is powerful, why does Zig also work on native backends?

Because LLVM has tradeoffs.

LLVM is large. It takes time to build. It adds complexity to bootstrapping. It may be slower than necessary for simple debug builds. It gives Zig less direct control over some backend behavior.

Native Zig backends can help with:

```text
faster debug compilation
simpler compiler bootstrapping
smaller dependency surface
more direct control over code generation
better integration with Zig internals
```

This does not make LLVM useless. LLVM remains valuable for optimized builds and broad target support.

A practical view:

```text
LLVM backend:
    mature optimization
    broad target support
    high-quality release code

native backends:
    faster feedback
    simpler paths for some targets
    more compiler control
```

Both approaches can coexist.

#### LLVM and Release Builds

LLVM is especially useful for optimized release builds.

When you build for performance, you want strong optimization.

Example:

```bash
zig build-exe main.zig -O ReleaseFast
```

For this kind of build, LLVM’s optimization pipeline can produce efficient machine code.

Release builds may spend more time compiling because the optimizer does more work. That tradeoff is acceptable when final runtime performance matters.

Debug builds have a different priority. They should compile quickly, preserve source-level debugging, and keep safety checks useful.

#### LLVM and Compile Times

LLVM can make compilation slower, especially when heavy optimization is enabled.

This is not because LLVM is bad. It is because optimization is expensive.

The compiler must analyze control flow, data flow, memory operations, function calls, loops, and target-specific instruction choices.

For large programs, this work can take significant time.

That is one reason Zig cares about native backends and fast debug compilation.

A good toolchain should support both:

```text
fast edit-compile-run cycles
high-quality optimized final binaries
```

#### LLVM and C/C++ Support

Zig can act as a C and C++ compiler driver with:

```bash
zig cc
zig c++
```

This is closely related to Clang and LLVM.

Clang is a C-family frontend that uses LLVM. Zig can package and drive this toolchain in a way that makes cross-compilation easier.

This is useful for building C dependencies, compiling mixed Zig and C projects, and using Zig as a portable C compiler driver.

For example:

```bash
zig cc main.c -target x86_64-linux
```

This can be easier than manually installing a separate cross C toolchain.

#### The Boundary Between Zig and LLVM

The most important architectural point is the boundary.

Zig owns the language.

LLVM owns much of the low-level backend work.

That means Zig must lower its own concepts into forms LLVM understands.

Examples:

```text
Zig error unions become lower-level data and control flow.
Zig optionals become lower-level representations.
Zig structs become memory layouts.
Zig function calls become ABI-specific calls.
Zig comptime results become already-resolved code or data.
```

By the time LLVM sees the program, Zig-specific meaning has mostly been translated away.

#### When LLVM Errors Appear

Most normal Zig errors come from Zig itself.

But sometimes you may see errors related to LLVM, especially with backend bugs, unsupported targets, inline assembly, linker interactions, or unusual code generation cases.

As a beginner, treat LLVM errors differently from normal Zig errors.

A normal Zig error often means your program violates a language rule.

An LLVM-related failure may mean:

```text
compiler bug
unsupported target feature
backend limitation
invalid inline assembly
linking problem
toolchain configuration issue
```

The distinction matters when debugging.

#### A Safe Mental Model

Use this model:

```text
Zig checks the program.
LLVM helps generate optimized machine code.
```

Zig’s compiler frontend understands Zig. It parses source files, resolves names, checks types, evaluates compile-time code, and produces analyzed internal representations.

LLVM works later. It takes lower-level compiler output, optimizes it, and emits target-specific code.

This division lets Zig focus on language design and compiler semantics while using a mature backend for many low-level code generation tasks.

