# How Parsing Works

### How Parsing Works

Parsing is the part of the compiler that reads source code and turns it into structure.

When you write a Zig file, you write plain text:

```zig
pub fn main() void {
    const x = 10 + 20;
}
```

The compiler cannot understand this text directly as a program. First, it must break the text into smaller pieces. Then it must organize those pieces into a tree.

That process has two main steps:

```text
source text
    ↓
tokens
    ↓
syntax tree
```

#### Source Text

A Zig file starts as characters.

For example:

```zig
const x = 42;
```

To a human, this is obviously a constant declaration.

To the compiler, it starts as a sequence of characters:

```text
c o n s t   x   =   4 2 ;
```

The compiler must discover where each meaningful piece begins and ends.

#### Tokenization

Tokenization is the first step.

A tokenizer, also called a lexer, turns characters into tokens.

A token is a small meaningful unit of source code.

For this line:

```zig
const x = 42;
```

the tokens are roughly:

```text
const      keyword
x          identifier
=          symbol
42         integer literal
;          symbol
```

Whitespace usually separates tokens, but whitespace itself is usually not important except where it affects token boundaries.

These two lines have the same meaning:

```zig
const x = 42;
```

```zig
const    x    =    42;
```

The tokenizer does not decide whether the whole program is correct. It only recognizes pieces.

It knows that `const` is a keyword. It knows that `42` is an integer literal. It knows that `x` is an identifier.

It does not yet know whether `x` has a valid type or whether the declaration is allowed in this place.

#### Parsing

Parsing starts after tokenization.

The parser reads the token stream and builds a syntax tree.

A syntax tree records how the tokens fit together.

For example:

```zig
const x = 10 + 20;
```

This is not just a flat list of tokens. The compiler needs to know that:

```text
const x = ...;
```

is a declaration, and:

```text
10 + 20
```

is an expression inside that declaration.

A simplified tree might look like this:

```text
variable declaration
├── name: x
└── value:
    └── binary expression +
        ├── integer literal 10
        └── integer literal 20
```

This tree is not exactly how the real Zig compiler stores it, but it shows the idea.

Parsing turns a line of code into a shape the compiler can inspect.

#### Abstract Syntax Tree

The tree built by the parser is usually called an AST.

AST means abstract syntax tree.

The word “abstract” means the tree does not keep every tiny detail of the original text. It keeps the structure that matters for understanding the program.

For example, the source code may contain spaces:

```zig
const x     =      42;
```

The AST does not care how many spaces were used.

It cares that there is a constant declaration named `x` with the value `42`.

A simplified AST node might contain:

```text
kind: const declaration
name: x
initializer: integer literal 42
```

The AST is still close to the source code. It does not fully know the meaning yet.

#### Parsing Is Not Type Checking

Parsing and type checking are separate.

This line can be parsed:

```zig
const x: u8 = 300;
```

The parser can understand the structure:

```text
constant declaration
name: x
type: u8
value: 300
```

But the program is still wrong, because `300` does not fit in `u8`.

That error is not a parsing error. It is a semantic error.

Parsing asks:

```text
Does this code have a valid grammatical shape?
```

Semantic analysis asks:

```text
Does this code make sense according to Zig’s rules?
```

That distinction matters when reading compiler errors.

A parsing error often means the compiler could not understand the structure of the code.

Example:

```zig
const x = ;
```

The parser expects an expression after `=`. There is none.

A semantic error means the structure is valid, but the meaning is wrong.

Example:

```zig
const x: u8 = 300;
```

The structure is clear, but the value does not fit the type.

#### Expressions

A large part of parsing is expression parsing.

Expressions are pieces of code that produce values.

Examples:

```zig
42
x
x + y
foo()
if (ready) 1 else 0
```

Expressions can nest inside other expressions.

Example:

```zig
const result = (a + b) * c;
```

The compiler must understand that `a + b` happens before multiplication because of the parentheses.

Without parentheses:

```zig
const result = a + b * c;
```

the compiler must understand operator precedence. Multiplication binds more tightly than addition, so this means:

```text
a + (b * c)
```

not:

```text
(a + b) * c
```

A simplified tree:

```text
binary expression +
├── identifier a
└── binary expression *
    ├── identifier b
    └── identifier c
```

This tree shape matters because later compiler stages use it to generate the correct code.

#### Statements and Declarations

The parser also recognizes declarations and statements.

A declaration introduces something by name.

Examples:

```zig
const x = 10;
var count: usize = 0;

fn add(a: i32, b: i32) i32 {
    return a + b;
}
```

A statement performs an action inside a function or block.

Examples:

```zig
return x;
defer file.close();
while (i < 10) : (i += 1) {
    sum += i;
}
```

Zig has an important design trait: many constructs are expressions.

For example, `if` can produce a value:

```zig
const value = if (flag) 10 else 20;
```

The parser must support this expression-oriented style.

#### Blocks

A block is code inside braces:

```zig
{
    const x = 10;
    const y = 20;
}
```

Blocks are important because they create structure.

A function body is a block:

```zig
fn main() void {
    const x = 10;
}
```

An `if` branch often contains a block:

```zig
if (ready) {
    start();
} else {
    stop();
}
```

The parser uses braces to know where a block begins and ends.

If a closing brace is missing, the parser may report an error later than the exact place where the mistake happened, because it kept looking for the end of the block.

Example:

```zig
pub fn main() void {
    const x = 10;
```

The parser reaches the end of the file while still inside the function body.

#### Error Recovery

A parser should not stop completely at the first mistake if it can continue.

Good parsers try to recover from errors.

For example:

```zig
const x = ;
const y = 20;
```

The first declaration is broken, but the parser may still be able to continue and recognize the second declaration.

This is useful because one compile run can show several errors instead of only one.

Error recovery is difficult because the parser must guess where normal structure resumes.

Common recovery points include:

```text
semicolon
closing brace
new declaration keyword
end of file
```

Compiler diagnostics depend heavily on parser quality. A good parser gives errors near the real mistake. A poor parser may produce confusing follow-up errors.

#### Source Locations

The parser also tracks source locations.

A source location tells the compiler where something came from:

```text
file name
line number
column number
byte offset
```

This is how the compiler can print an error like:

```text
main.zig:3:15: error: expected expression, found ';'
```

Without source locations, the compiler might know that something is wrong, but it could not show you where.

Source locations must travel through later compiler stages too. Semantic analysis and code generation need them for useful diagnostics.

#### From AST to Later Stages

After parsing, the compiler has an AST.

But the AST is not the final internal form.

The compiler still needs to:

```text
resolve names
resolve imports
infer types
check declarations
evaluate comptime code
lower into intermediate representations
generate code
link output
```

So the AST is only the beginning.

A rough flow:

```text
tokens
    ↓
AST
    ↓
ZIR
    ↓
semantic analysis
    ↓
AIR
    ↓
code generation
```

The parser’s job is to give the rest of the compiler a reliable structural map of the source file.

#### Why Parsing Matters

Parsing may sound like a small step, but it shapes everything after it.

If the parser builds the wrong tree, the compiler will misunderstand the program.

If the parser loses source locations, error messages become poor.

If the parser handles edge cases badly, valid code may be rejected or invalid code may produce confusing errors.

Parsing is the compiler’s first real understanding of your code.

It does not yet know every meaning, but it knows the shape.

