How Parsing Works

Parsing is the part of the compiler that reads source code and turns it into structure.

When you write a Zig file, you write plain text:

pub fn main() void {
    const x = 10 + 20;
}

The compiler cannot understand this text directly as a program. First, it must break the text into smaller pieces. Then it must organize those pieces into a tree.

That process has two main steps:

source text
    ↓
tokens
    ↓
syntax tree

Source Text

A Zig file starts as characters.

For example:

const x = 42;

To a human, this is obviously a constant declaration.

To the compiler, it starts as a sequence of characters:

c o n s t   x   =   4 2 ;

The compiler must discover where each meaningful piece begins and ends.

Tokenization

Tokenization is the first step.

A tokenizer, also called a lexer, turns characters into tokens.

A token is a small meaningful unit of source code.

For this line:

const x = 42;

the tokens are roughly:

const      keyword
x          identifier
=          symbol
42         integer literal
;          symbol

Whitespace usually separates tokens, but whitespace itself is usually not important except where it affects token boundaries.

These two lines have the same meaning:

const x = 42;

const    x    =    42;

The tokenizer does not decide whether the whole program is correct. It only recognizes pieces.

It knows that const is a keyword. It knows that 42 is an integer literal. It knows that x is an identifier.

It does not yet know whether x has a valid type or whether the declaration is allowed in this place.

Parsing

Parsing starts after tokenization.

The parser reads the token stream and builds a syntax tree.

A syntax tree records how the tokens fit together.

For example:

const x = 10 + 20;

This is not just a flat list of tokens. The compiler needs to know that:

const x = ...;

is a declaration, and:

10 + 20

is an expression inside that declaration.

A simplified tree might look like this:

variable declaration
├── name: x
└── value:
    └── binary expression +
        ├── integer literal 10
        └── integer literal 20

This tree is not exactly how the real Zig compiler stores it, but it shows the idea.

Parsing turns a line of code into a shape the compiler can inspect.

Abstract Syntax Tree

The tree built by the parser is usually called an AST.

AST means abstract syntax tree.

The word “abstract” means the tree does not keep every tiny detail of the original text. It keeps the structure that matters for understanding the program.

For example, the source code may contain spaces:

const x     =      42;

The AST does not care how many spaces were used.

It cares that there is a constant declaration named x with the value 42.

A simplified AST node might contain:

kind: const declaration
name: x
initializer: integer literal 42

The AST is still close to the source code. It does not fully know the meaning yet.

Parsing Is Not Type Checking

Parsing and type checking are separate.

This line can be parsed:

const x: u8 = 300;

The parser can understand the structure:

constant declaration
name: x
type: u8
value: 300

But the program is still wrong, because 300 does not fit in u8.

That error is not a parsing error. It is a semantic error.

Parsing asks:

Does this code have a valid grammatical shape?

Semantic analysis asks:

Does this code make sense according to Zig’s rules?

That distinction matters when reading compiler errors.

A parsing error often means the compiler could not understand the structure of the code.

Example:

const x = ;

The parser expects an expression after =. There is none.

A semantic error means the structure is valid, but the meaning is wrong.

Example:

const x: u8 = 300;

The structure is clear, but the value does not fit the type.

Expressions

A large part of parsing is expression parsing.

Expressions are pieces of code that produce values.

Examples:

42
x
x + y
foo()
if (ready) 1 else 0

Expressions can nest inside other expressions.

Example:

const result = (a + b) * c;

The compiler must understand that a + b happens before multiplication because of the parentheses.

Without parentheses:

const result = a + b * c;

the compiler must understand operator precedence. Multiplication binds more tightly than addition, so this means:

a + (b * c)

not:

(a + b) * c

A simplified tree:

binary expression +
├── identifier a
└── binary expression *
    ├── identifier b
    └── identifier c

This tree shape matters because later compiler stages use it to generate the correct code.

Statements and Declarations

The parser also recognizes declarations and statements.

A declaration introduces something by name.

Examples:

const x = 10;
var count: usize = 0;

fn add(a: i32, b: i32) i32 {
    return a + b;
}

A statement performs an action inside a function or block.

Examples:

return x;
defer file.close();
while (i < 10) : (i += 1) {
    sum += i;
}

Zig has an important design trait: many constructs are expressions.

For example, if can produce a value:

const value = if (flag) 10 else 20;

The parser must support this expression-oriented style.

Blocks

A block is code inside braces:

{
    const x = 10;
    const y = 20;
}

Blocks are important because they create structure.

A function body is a block:

fn main() void {
    const x = 10;
}

An if branch often contains a block:

if (ready) {
    start();
} else {
    stop();
}

The parser uses braces to know where a block begins and ends.

If a closing brace is missing, the parser may report an error later than the exact place where the mistake happened, because it kept looking for the end of the block.

Example:

pub fn main() void {
    const x = 10;

The parser reaches the end of the file while still inside the function body.

Error Recovery

A parser should not stop completely at the first mistake if it can continue.

Good parsers try to recover from errors.

For example:

const x = ;
const y = 20;

The first declaration is broken, but the parser may still be able to continue and recognize the second declaration.

This is useful because one compile run can show several errors instead of only one.

Error recovery is difficult because the parser must guess where normal structure resumes.

Common recovery points include:

semicolon
closing brace
new declaration keyword
end of file

Compiler diagnostics depend heavily on parser quality. A good parser gives errors near the real mistake. A poor parser may produce confusing follow-up errors.

Source Locations

The parser also tracks source locations.

A source location tells the compiler where something came from:

file name
line number
column number
byte offset

This is how the compiler can print an error like:

main.zig:3:15: error: expected expression, found ';'

Without source locations, the compiler might know that something is wrong, but it could not show you where.

Source locations must travel through later compiler stages too. Semantic analysis and code generation need them for useful diagnostics.

From AST to Later Stages

After parsing, the compiler has an AST.

But the AST is not the final internal form.

The compiler still needs to:

resolve names
resolve imports
infer types
check declarations
evaluate comptime code
lower into intermediate representations
generate code
link output

So the AST is only the beginning.

A rough flow:

tokens
    ↓
AST
    ↓
ZIR
    ↓
semantic analysis
    ↓
AIR
    ↓
code generation

The parser’s job is to give the rest of the compiler a reliable structural map of the source file.

Why Parsing Matters

Parsing may sound like a small step, but it shapes everything after it.

If the parser builds the wrong tree, the compiler will misunderstand the program.

If the parser loses source locations, error messages become poor.

If the parser handles edge cases badly, valid code may be rejected or invalid code may produce confusing errors.

Parsing is the compiler’s first real understanding of your code.

It does not yet know every meaning, but it knows the shape.