# Build a JSON Parser

### Build a JSON Parser

A JSON parser reads JSON text and turns it into data your program can use.

JSON looks like this:

```json
{
  "name": "Zig",
  "year": 2016,
  "systems_language": true
}
```

A parser does not just store this as text. It understands the structure:

```text
object
  name -> string "Zig"
  year -> number 2016
  systems_language -> boolean true
```

In this section, we will build a small JSON parser for a limited subset of JSON. It will support:

```text
null
true
false
numbers
strings
arrays
objects
```

We will keep strings simple at first. We will not fully implement escape sequences such as `\n`, `\"`, or Unicode escapes. A complete JSON parser needs those, but they add too much detail for a first version.

#### The Shape of JSON Values

JSON has a small set of value types.

```zig
const Value = union(enum) {
    null,
    boolean: bool,
    number: f64,
    string: []const u8,
    array: []Value,
    object: []Field,
};

const Field = struct {
    key: []const u8,
    value: Value,
};
```

This is a tagged union.

A `Value` can be one of several forms. If it is a number, it stores an `f64`. If it is a string, it stores a slice of bytes. If it is an object, it stores fields.

This is the first important idea: parsing turns plain text into structured data.

#### A Parser Struct

A parser needs to remember where it is in the input.

```zig
const Parser = struct {
    input: []const u8,
    index: usize,

    fn init(input: []const u8) Parser {
        return Parser{
            .input = input,
            .index = 0,
        };
    }
};
```

The `input` field stores all JSON text.

The `index` field stores the current position.

For example, if the input is:

```json
true
```

At the beginning:

```text
index = 0
```

After reading `true`:

```text
index = 4
```

A parser is mostly a careful movement through text.

#### Peeking and Advancing

Add helper methods:

```zig
fn peek(self: *Parser) ?u8 {
    if (self.index >= self.input.len) {
        return null;
    }

    return self.input[self.index];
}

fn advance(self: *Parser) ?u8 {
    const ch = self.peek() orelse return null;
    self.index += 1;
    return ch;
}
```

`peek` looks at the current byte without moving.

`advance` reads the current byte and moves forward by one.

These two operations are enough to build the rest of the parser.

#### Skipping Whitespace

JSON allows whitespace between values.

These inputs mean the same thing:

```json
true
```

```json
   true
```

```json
[
  true,
  false
]
```

Add this method:

```zig
fn skipWhitespace(self: *Parser) void {
    while (self.peek()) |ch| {
        switch (ch) {
            ' ', '\n', '\r', '\t' => _ = self.advance(),
            else => return,
        }
    }
}
```

Before parsing any value, we call `skipWhitespace`.

#### Parsing a Value

Now write the main dispatcher:

```zig
fn parseValue(self: *Parser, allocator: std.mem.Allocator) !Value {
    self.skipWhitespace();

    const ch = self.peek() orelse return error.UnexpectedEnd;

    return switch (ch) {
        'n' => self.parseNull(),
        't' => self.parseTrue(),
        'f' => self.parseFalse(),
        '"' => self.parseString(),
        '[' => self.parseArray(allocator),
        '{' => self.parseObject(allocator),
        '-', '0'...'9' => self.parseNumber(),
        else => error.InvalidValue,
    };
}
```

This function looks at the next byte and decides which parser to call.

If the next byte is `t`, the value may be `true`.

If the next byte is `[`, the value is an array.

If the next byte is `{`, the value is an object.

#### Matching Fixed Words

JSON has three fixed word values:

```json
null
true
false
```

We can parse them with a helper:

```zig
fn matchText(self: *Parser, text: []const u8) bool {
    if (self.index + text.len > self.input.len) {
        return false;
    }

    if (!std.mem.eql(u8, self.input[self.index .. self.index + text.len], text)) {
        return false;
    }

    self.index += text.len;
    return true;
}
```

Now parse the literals:

```zig
fn parseNull(self: *Parser) !Value {
    if (!self.matchText("null")) {
        return error.InvalidValue;
    }

    return Value.null;
}

fn parseTrue(self: *Parser) !Value {
    if (!self.matchText("true")) {
        return error.InvalidValue;
    }

    return Value{ .boolean = true };
}

fn parseFalse(self: *Parser) !Value {
    if (!self.matchText("false")) {
        return error.InvalidValue;
    }

    return Value{ .boolean = false };
}
```

The result is a `Value`.

#### Parsing Strings

A JSON string starts with `"` and ends with `"`.

For this first parser, we will only support simple strings with no escapes.

```zig
fn parseString(self: *Parser) !Value {
    if (self.advance() != '"') {
        return error.InvalidString;
    }

    const start = self.index;

    while (self.peek()) |ch| {
        if (ch == '"') {
            const text = self.input[start..self.index];
            _ = self.advance();
            return Value{ .string = text };
        }

        if (ch == '\\') {
            return error.EscapesNotSupported;
        }

        _ = self.advance();
    }

    return error.UnexpectedEnd;
}
```

This parser does not allocate a new string. It returns a slice into the original input.

That is efficient, but it means the original JSON text must remain alive while the parsed value is used.

#### Parsing Numbers

We will parse numbers as `f64`.

```zig
fn parseNumber(self: *Parser) !Value {
    const start = self.index;

    if (self.peek() == '-') {
        _ = self.advance();
    }

    while (self.peek()) |ch| {
        switch (ch) {
            '0'...'9' => _ = self.advance(),
            else => break,
        }
    }

    if (self.peek() == '.') {
        _ = self.advance();

        while (self.peek()) |ch| {
            switch (ch) {
                '0'...'9' => _ = self.advance(),
                else => break,
            }
        }
    }

    const text = self.input[start..self.index];
    const number = try std.fmt.parseFloat(f64, text);

    return Value{ .number = number };
}
```

This accepts:

```text
0
123
-5
3.14
-0.5
```

It does not yet fully enforce the JSON number grammar. For example, a production parser would reject some malformed inputs more precisely.

#### Parsing Arrays

A JSON array starts with `[` and ends with `]`.

Example:

```json
[1, true, "zig"]
```

We need a dynamic list because we do not know the number of elements in advance.

```zig
fn parseArray(self: *Parser, allocator: std.mem.Allocator) !Value {
    if (self.advance() != '[') {
        return error.InvalidArray;
    }

    var items = std.ArrayList(Value).init(allocator);
    errdefer items.deinit();

    self.skipWhitespace();

    if (self.peek() == ']') {
        _ = self.advance();
        return Value{ .array = try items.toOwnedSlice() };
    }

    while (true) {
        const value = try self.parseValue(allocator);
        try items.append(value);

        self.skipWhitespace();

        const ch = self.advance() orelse return error.UnexpectedEnd;

        switch (ch) {
            ',' => continue,
            ']' => return Value{ .array = try items.toOwnedSlice() },
            else => return error.InvalidArray,
        }
    }
}
```

The key part is this loop:

```zig
while (true) {
    const value = try self.parseValue(allocator);
    try items.append(value);
    ...
}
```

An array contains values, so the array parser calls `parseValue` again.

This is recursion.

#### Parsing Objects

A JSON object starts with `{` and ends with `}`.

Example:

```json
{"name": "Zig", "year": 2016}
```

Each object field has:

```text
string key
colon
value
```

Here is the parser:

```zig
fn parseObject(self: *Parser, allocator: std.mem.Allocator) !Value {
    if (self.advance() != '{') {
        return error.InvalidObject;
    }

    var fields = std.ArrayList(Field).init(allocator);
    errdefer fields.deinit();

    self.skipWhitespace();

    if (self.peek() == '}') {
        _ = self.advance();
        return Value{ .object = try fields.toOwnedSlice() };
    }

    while (true) {
        self.skipWhitespace();

        const key_value = try self.parseString();
        const key = switch (key_value) {
            .string => |s| s,
            else => return error.InvalidObject,
        };

        self.skipWhitespace();

        if (self.advance() != ':') {
            return error.InvalidObject;
        }

        const value = try self.parseValue(allocator);

        try fields.append(Field{
            .key = key,
            .value = value,
        });

        self.skipWhitespace();

        const ch = self.advance() orelse return error.UnexpectedEnd;

        switch (ch) {
            ',' => continue,
            '}' => return Value{ .object = try fields.toOwnedSlice() },
            else => return error.InvalidObject,
        }
    }
}
```

This parser is also recursive. An object field can contain any JSON value, including another object.

#### The Complete Program

Put this in `src/main.zig`:

```zig
const std = @import("std");

const Field = struct {
    key: []const u8,
    value: Value,
};

const Value = union(enum) {
    null,
    boolean: bool,
    number: f64,
    string: []const u8,
    array: []Value,
    object: []Field,
};

const Parser = struct {
    input: []const u8,
    index: usize,

    fn init(input: []const u8) Parser {
        return Parser{
            .input = input,
            .index = 0,
        };
    }

    fn peek(self: *Parser) ?u8 {
        if (self.index >= self.input.len) {
            return null;
        }

        return self.input[self.index];
    }

    fn advance(self: *Parser) ?u8 {
        const ch = self.peek() orelse return null;
        self.index += 1;
        return ch;
    }

    fn skipWhitespace(self: *Parser) void {
        while (self.peek()) |ch| {
            switch (ch) {
                ' ', '\n', '\r', '\t' => _ = self.advance(),
                else => return,
            }
        }
    }

    fn matchText(self: *Parser, text: []const u8) bool {
        if (self.index + text.len > self.input.len) {
            return false;
        }

        if (!std.mem.eql(u8, self.input[self.index .. self.index + text.len], text)) {
            return false;
        }

        self.index += text.len;
        return true;
    }

    fn parseValue(self: *Parser, allocator: std.mem.Allocator) !Value {
        self.skipWhitespace();

        const ch = self.peek() orelse return error.UnexpectedEnd;

        return switch (ch) {
            'n' => self.parseNull(),
            't' => self.parseTrue(),
            'f' => self.parseFalse(),
            '"' => self.parseString(),
            '[' => self.parseArray(allocator),
            '{' => self.parseObject(allocator),
            '-', '0'...'9' => self.parseNumber(),
            else => error.InvalidValue,
        };
    }

    fn parseNull(self: *Parser) !Value {
        if (!self.matchText("null")) {
            return error.InvalidValue;
        }

        return Value.null;
    }

    fn parseTrue(self: *Parser) !Value {
        if (!self.matchText("true")) {
            return error.InvalidValue;
        }

        return Value{ .boolean = true };
    }

    fn parseFalse(self: *Parser) !Value {
        if (!self.matchText("false")) {
            return error.InvalidValue;
        }

        return Value{ .boolean = false };
    }

    fn parseString(self: *Parser) !Value {
        if (self.advance() != '"') {
            return error.InvalidString;
        }

        const start = self.index;

        while (self.peek()) |ch| {
            if (ch == '"') {
                const text = self.input[start..self.index];
                _ = self.advance();
                return Value{ .string = text };
            }

            if (ch == '\\') {
                return error.EscapesNotSupported;
            }

            _ = self.advance();
        }

        return error.UnexpectedEnd;
    }

    fn parseNumber(self: *Parser) !Value {
        const start = self.index;

        if (self.peek() == '-') {
            _ = self.advance();
        }

        while (self.peek()) |ch| {
            switch (ch) {
                '0'...'9' => _ = self.advance(),
                else => break,
            }
        }

        if (self.peek() == '.') {
            _ = self.advance();

            while (self.peek()) |ch| {
                switch (ch) {
                    '0'...'9' => _ = self.advance(),
                    else => break,
                }
            }
        }

        const text = self.input[start..self.index];
        const number = try std.fmt.parseFloat(f64, text);

        return Value{ .number = number };
    }

    fn parseArray(self: *Parser, allocator: std.mem.Allocator) !Value {
        if (self.advance() != '[') {
            return error.InvalidArray;
        }

        var items = std.ArrayList(Value).init(allocator);
        errdefer items.deinit();

        self.skipWhitespace();

        if (self.peek() == ']') {
            _ = self.advance();
            return Value{ .array = try items.toOwnedSlice() };
        }

        while (true) {
            const value = try self.parseValue(allocator);
            try items.append(value);

            self.skipWhitespace();

            const ch = self.advance() orelse return error.UnexpectedEnd;

            switch (ch) {
                ',' => continue,
                ']' => return Value{ .array = try items.toOwnedSlice() },
                else => return error.InvalidArray,
            }
        }
    }

    fn parseObject(self: *Parser, allocator: std.mem.Allocator) !Value {
        if (self.advance() != '{') {
            return error.InvalidObject;
        }

        var fields = std.ArrayList(Field).init(allocator);
        errdefer fields.deinit();

        self.skipWhitespace();

        if (self.peek() == '}') {
            _ = self.advance();
            return Value{ .object = try fields.toOwnedSlice() };
        }

        while (true) {
            self.skipWhitespace();

            const key_value = try self.parseString();
            const key = switch (key_value) {
                .string => |s| s,
                else => return error.InvalidObject,
            };

            self.skipWhitespace();

            if (self.advance() != ':') {
                return error.InvalidObject;
            }

            const value = try self.parseValue(allocator);

            try fields.append(Field{
                .key = key,
                .value = value,
            });

            self.skipWhitespace();

            const ch = self.advance() orelse return error.UnexpectedEnd;

            switch (ch) {
                ',' => continue,
                '}' => return Value{ .object = try fields.toOwnedSlice() },
                else => return error.InvalidObject,
            }
        }
    }
};

fn parseJson(allocator: std.mem.Allocator, input: []const u8) !Value {
    var parser = Parser.init(input);
    const value = try parser.parseValue(allocator);

    parser.skipWhitespace();

    if (parser.peek() != null) {
        return error.TrailingInput;
    }

    return value;
}

fn printValue(value: Value, indent: usize) void {
    const spaces = "                                ";
    const prefix = spaces[0..@min(indent, spaces.len)];

    switch (value) {
        .null => std.debug.print("{s}null\n", .{prefix}),
        .boolean => |b| std.debug.print("{s}bool: {}\n", .{ prefix, b }),
        .number => |n| std.debug.print("{s}number: {d}\n", .{ prefix, n }),
        .string => |s| std.debug.print("{s}string: {s}\n", .{ prefix, s }),
        .array => |items| {
            std.debug.print("{s}array\n", .{prefix});
            for (items) |item| {
                printValue(item, indent + 2);
            }
        },
        .object => |fields| {
            std.debug.print("{s}object\n", .{prefix});
            for (fields) |field| {
                std.debug.print("{s}  key: {s}\n", .{ prefix, field.key });
                printValue(field.value, indent + 4);
            }
        },
    }
}

pub fn main() !void {
    const input =
        \\{
        \\  "name": "Zig",
        \\  "year": 2016,
        \\  "safe": true,
        \\  "features": ["manual memory", "comptime", "c interop"],
        \\  "nothing": null
        \\}
    ;

    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();

    const value = try parseJson(arena.allocator(), input);
    printValue(value, 0);
}
```

Run:

```bash
zig build run
```

Expected output:

```text
object
  key: name
    string: Zig
  key: year
    number: 2016
  key: safe
    bool: true
  key: features
    array
      string: manual memory
      string: comptime
      string: c interop
  key: nothing
    null
```

#### Why an Arena Allocator Works Well Here

The parser allocates arrays and object fields while parsing.

Those allocations all have the same lifetime. Once we finish using the parsed JSON tree, we can free everything together.

That is why this program uses an arena:

```zig
var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena.deinit();
```

The parser does not free each array separately. The arena frees all allocated memory at once when `arena.deinit()` runs.

This is a common pattern for parsers, compilers, and short-lived data trees.

#### What This Parser Does Not Handle Yet

This is not a complete JSON parser.

It does not fully handle string escapes.

It does not fully validate number syntax.

It does not produce detailed error locations.

It does not free nested allocations individually.

It does not support streaming input.

Those are normal next steps. The important part is that the basic architecture is now visible.

A parser usually has this shape:

```text
input text
current index
peek
advance
skip whitespace
parse one value
parse nested values recursively
return structured data
```

Once you understand that shape, larger parsers become much less mysterious.

