Skip to content

Build a JSON Parser

A JSON parser reads JSON text and turns it into data your program can use.

A JSON parser reads JSON text and turns it into data your program can use.

JSON looks like this:

{
  "name": "Zig",
  "year": 2016,
  "systems_language": true
}

A parser does not just store this as text. It understands the structure:

object
  name -> string "Zig"
  year -> number 2016
  systems_language -> boolean true

In this section, we will build a small JSON parser for a limited subset of JSON. It will support:

null
true
false
numbers
strings
arrays
objects

We will keep strings simple at first. We will not fully implement escape sequences such as \n, \", or Unicode escapes. A complete JSON parser needs those, but they add too much detail for a first version.

The Shape of JSON Values

JSON has a small set of value types.

const Value = union(enum) {
    null,
    boolean: bool,
    number: f64,
    string: []const u8,
    array: []Value,
    object: []Field,
};

const Field = struct {
    key: []const u8,
    value: Value,
};

This is a tagged union.

A Value can be one of several forms. If it is a number, it stores an f64. If it is a string, it stores a slice of bytes. If it is an object, it stores fields.

This is the first important idea: parsing turns plain text into structured data.

A Parser Struct

A parser needs to remember where it is in the input.

const Parser = struct {
    input: []const u8,
    index: usize,

    fn init(input: []const u8) Parser {
        return Parser{
            .input = input,
            .index = 0,
        };
    }
};

The input field stores all JSON text.

The index field stores the current position.

For example, if the input is:

true

At the beginning:

index = 0

After reading true:

index = 4

A parser is mostly a careful movement through text.

Peeking and Advancing

Add helper methods:

fn peek(self: *Parser) ?u8 {
    if (self.index >= self.input.len) {
        return null;
    }

    return self.input[self.index];
}

fn advance(self: *Parser) ?u8 {
    const ch = self.peek() orelse return null;
    self.index += 1;
    return ch;
}

peek looks at the current byte without moving.

advance reads the current byte and moves forward by one.

These two operations are enough to build the rest of the parser.

Skipping Whitespace

JSON allows whitespace between values.

These inputs mean the same thing:

true
   true
[
  true,
  false
]

Add this method:

fn skipWhitespace(self: *Parser) void {
    while (self.peek()) |ch| {
        switch (ch) {
            ' ', '\n', '\r', '\t' => _ = self.advance(),
            else => return,
        }
    }
}

Before parsing any value, we call skipWhitespace.

Parsing a Value

Now write the main dispatcher:

fn parseValue(self: *Parser, allocator: std.mem.Allocator) !Value {
    self.skipWhitespace();

    const ch = self.peek() orelse return error.UnexpectedEnd;

    return switch (ch) {
        'n' => self.parseNull(),
        't' => self.parseTrue(),
        'f' => self.parseFalse(),
        '"' => self.parseString(),
        '[' => self.parseArray(allocator),
        '{' => self.parseObject(allocator),
        '-', '0'...'9' => self.parseNumber(),
        else => error.InvalidValue,
    };
}

This function looks at the next byte and decides which parser to call.

If the next byte is t, the value may be true.

If the next byte is [, the value is an array.

If the next byte is {, the value is an object.

Matching Fixed Words

JSON has three fixed word values:

null
true
false

We can parse them with a helper:

fn matchText(self: *Parser, text: []const u8) bool {
    if (self.index + text.len > self.input.len) {
        return false;
    }

    if (!std.mem.eql(u8, self.input[self.index .. self.index + text.len], text)) {
        return false;
    }

    self.index += text.len;
    return true;
}

Now parse the literals:

fn parseNull(self: *Parser) !Value {
    if (!self.matchText("null")) {
        return error.InvalidValue;
    }

    return Value.null;
}

fn parseTrue(self: *Parser) !Value {
    if (!self.matchText("true")) {
        return error.InvalidValue;
    }

    return Value{ .boolean = true };
}

fn parseFalse(self: *Parser) !Value {
    if (!self.matchText("false")) {
        return error.InvalidValue;
    }

    return Value{ .boolean = false };
}

The result is a Value.

Parsing Strings

A JSON string starts with " and ends with ".

For this first parser, we will only support simple strings with no escapes.

fn parseString(self: *Parser) !Value {
    if (self.advance() != '"') {
        return error.InvalidString;
    }

    const start = self.index;

    while (self.peek()) |ch| {
        if (ch == '"') {
            const text = self.input[start..self.index];
            _ = self.advance();
            return Value{ .string = text };
        }

        if (ch == '\\') {
            return error.EscapesNotSupported;
        }

        _ = self.advance();
    }

    return error.UnexpectedEnd;
}

This parser does not allocate a new string. It returns a slice into the original input.

That is efficient, but it means the original JSON text must remain alive while the parsed value is used.

Parsing Numbers

We will parse numbers as f64.

fn parseNumber(self: *Parser) !Value {
    const start = self.index;

    if (self.peek() == '-') {
        _ = self.advance();
    }

    while (self.peek()) |ch| {
        switch (ch) {
            '0'...'9' => _ = self.advance(),
            else => break,
        }
    }

    if (self.peek() == '.') {
        _ = self.advance();

        while (self.peek()) |ch| {
            switch (ch) {
                '0'...'9' => _ = self.advance(),
                else => break,
            }
        }
    }

    const text = self.input[start..self.index];
    const number = try std.fmt.parseFloat(f64, text);

    return Value{ .number = number };
}

This accepts:

0
123
-5
3.14
-0.5

It does not yet fully enforce the JSON number grammar. For example, a production parser would reject some malformed inputs more precisely.

Parsing Arrays

A JSON array starts with [ and ends with ].

Example:

[1, true, "zig"]

We need a dynamic list because we do not know the number of elements in advance.

fn parseArray(self: *Parser, allocator: std.mem.Allocator) !Value {
    if (self.advance() != '[') {
        return error.InvalidArray;
    }

    var items = std.ArrayList(Value).init(allocator);
    errdefer items.deinit();

    self.skipWhitespace();

    if (self.peek() == ']') {
        _ = self.advance();
        return Value{ .array = try items.toOwnedSlice() };
    }

    while (true) {
        const value = try self.parseValue(allocator);
        try items.append(value);

        self.skipWhitespace();

        const ch = self.advance() orelse return error.UnexpectedEnd;

        switch (ch) {
            ',' => continue,
            ']' => return Value{ .array = try items.toOwnedSlice() },
            else => return error.InvalidArray,
        }
    }
}

The key part is this loop:

while (true) {
    const value = try self.parseValue(allocator);
    try items.append(value);
    ...
}

An array contains values, so the array parser calls parseValue again.

This is recursion.

Parsing Objects

A JSON object starts with { and ends with }.

Example:

{"name": "Zig", "year": 2016}

Each object field has:

string key
colon
value

Here is the parser:

fn parseObject(self: *Parser, allocator: std.mem.Allocator) !Value {
    if (self.advance() != '{') {
        return error.InvalidObject;
    }

    var fields = std.ArrayList(Field).init(allocator);
    errdefer fields.deinit();

    self.skipWhitespace();

    if (self.peek() == '}') {
        _ = self.advance();
        return Value{ .object = try fields.toOwnedSlice() };
    }

    while (true) {
        self.skipWhitespace();

        const key_value = try self.parseString();
        const key = switch (key_value) {
            .string => |s| s,
            else => return error.InvalidObject,
        };

        self.skipWhitespace();

        if (self.advance() != ':') {
            return error.InvalidObject;
        }

        const value = try self.parseValue(allocator);

        try fields.append(Field{
            .key = key,
            .value = value,
        });

        self.skipWhitespace();

        const ch = self.advance() orelse return error.UnexpectedEnd;

        switch (ch) {
            ',' => continue,
            '}' => return Value{ .object = try fields.toOwnedSlice() },
            else => return error.InvalidObject,
        }
    }
}

This parser is also recursive. An object field can contain any JSON value, including another object.

The Complete Program

Put this in src/main.zig:

const std = @import("std");

const Field = struct {
    key: []const u8,
    value: Value,
};

const Value = union(enum) {
    null,
    boolean: bool,
    number: f64,
    string: []const u8,
    array: []Value,
    object: []Field,
};

const Parser = struct {
    input: []const u8,
    index: usize,

    fn init(input: []const u8) Parser {
        return Parser{
            .input = input,
            .index = 0,
        };
    }

    fn peek(self: *Parser) ?u8 {
        if (self.index >= self.input.len) {
            return null;
        }

        return self.input[self.index];
    }

    fn advance(self: *Parser) ?u8 {
        const ch = self.peek() orelse return null;
        self.index += 1;
        return ch;
    }

    fn skipWhitespace(self: *Parser) void {
        while (self.peek()) |ch| {
            switch (ch) {
                ' ', '\n', '\r', '\t' => _ = self.advance(),
                else => return,
            }
        }
    }

    fn matchText(self: *Parser, text: []const u8) bool {
        if (self.index + text.len > self.input.len) {
            return false;
        }

        if (!std.mem.eql(u8, self.input[self.index .. self.index + text.len], text)) {
            return false;
        }

        self.index += text.len;
        return true;
    }

    fn parseValue(self: *Parser, allocator: std.mem.Allocator) !Value {
        self.skipWhitespace();

        const ch = self.peek() orelse return error.UnexpectedEnd;

        return switch (ch) {
            'n' => self.parseNull(),
            't' => self.parseTrue(),
            'f' => self.parseFalse(),
            '"' => self.parseString(),
            '[' => self.parseArray(allocator),
            '{' => self.parseObject(allocator),
            '-', '0'...'9' => self.parseNumber(),
            else => error.InvalidValue,
        };
    }

    fn parseNull(self: *Parser) !Value {
        if (!self.matchText("null")) {
            return error.InvalidValue;
        }

        return Value.null;
    }

    fn parseTrue(self: *Parser) !Value {
        if (!self.matchText("true")) {
            return error.InvalidValue;
        }

        return Value{ .boolean = true };
    }

    fn parseFalse(self: *Parser) !Value {
        if (!self.matchText("false")) {
            return error.InvalidValue;
        }

        return Value{ .boolean = false };
    }

    fn parseString(self: *Parser) !Value {
        if (self.advance() != '"') {
            return error.InvalidString;
        }

        const start = self.index;

        while (self.peek()) |ch| {
            if (ch == '"') {
                const text = self.input[start..self.index];
                _ = self.advance();
                return Value{ .string = text };
            }

            if (ch == '\\') {
                return error.EscapesNotSupported;
            }

            _ = self.advance();
        }

        return error.UnexpectedEnd;
    }

    fn parseNumber(self: *Parser) !Value {
        const start = self.index;

        if (self.peek() == '-') {
            _ = self.advance();
        }

        while (self.peek()) |ch| {
            switch (ch) {
                '0'...'9' => _ = self.advance(),
                else => break,
            }
        }

        if (self.peek() == '.') {
            _ = self.advance();

            while (self.peek()) |ch| {
                switch (ch) {
                    '0'...'9' => _ = self.advance(),
                    else => break,
                }
            }
        }

        const text = self.input[start..self.index];
        const number = try std.fmt.parseFloat(f64, text);

        return Value{ .number = number };
    }

    fn parseArray(self: *Parser, allocator: std.mem.Allocator) !Value {
        if (self.advance() != '[') {
            return error.InvalidArray;
        }

        var items = std.ArrayList(Value).init(allocator);
        errdefer items.deinit();

        self.skipWhitespace();

        if (self.peek() == ']') {
            _ = self.advance();
            return Value{ .array = try items.toOwnedSlice() };
        }

        while (true) {
            const value = try self.parseValue(allocator);
            try items.append(value);

            self.skipWhitespace();

            const ch = self.advance() orelse return error.UnexpectedEnd;

            switch (ch) {
                ',' => continue,
                ']' => return Value{ .array = try items.toOwnedSlice() },
                else => return error.InvalidArray,
            }
        }
    }

    fn parseObject(self: *Parser, allocator: std.mem.Allocator) !Value {
        if (self.advance() != '{') {
            return error.InvalidObject;
        }

        var fields = std.ArrayList(Field).init(allocator);
        errdefer fields.deinit();

        self.skipWhitespace();

        if (self.peek() == '}') {
            _ = self.advance();
            return Value{ .object = try fields.toOwnedSlice() };
        }

        while (true) {
            self.skipWhitespace();

            const key_value = try self.parseString();
            const key = switch (key_value) {
                .string => |s| s,
                else => return error.InvalidObject,
            };

            self.skipWhitespace();

            if (self.advance() != ':') {
                return error.InvalidObject;
            }

            const value = try self.parseValue(allocator);

            try fields.append(Field{
                .key = key,
                .value = value,
            });

            self.skipWhitespace();

            const ch = self.advance() orelse return error.UnexpectedEnd;

            switch (ch) {
                ',' => continue,
                '}' => return Value{ .object = try fields.toOwnedSlice() },
                else => return error.InvalidObject,
            }
        }
    }
};

fn parseJson(allocator: std.mem.Allocator, input: []const u8) !Value {
    var parser = Parser.init(input);
    const value = try parser.parseValue(allocator);

    parser.skipWhitespace();

    if (parser.peek() != null) {
        return error.TrailingInput;
    }

    return value;
}

fn printValue(value: Value, indent: usize) void {
    const spaces = "                                ";
    const prefix = spaces[0..@min(indent, spaces.len)];

    switch (value) {
        .null => std.debug.print("{s}null\n", .{prefix}),
        .boolean => |b| std.debug.print("{s}bool: {}\n", .{ prefix, b }),
        .number => |n| std.debug.print("{s}number: {d}\n", .{ prefix, n }),
        .string => |s| std.debug.print("{s}string: {s}\n", .{ prefix, s }),
        .array => |items| {
            std.debug.print("{s}array\n", .{prefix});
            for (items) |item| {
                printValue(item, indent + 2);
            }
        },
        .object => |fields| {
            std.debug.print("{s}object\n", .{prefix});
            for (fields) |field| {
                std.debug.print("{s}  key: {s}\n", .{ prefix, field.key });
                printValue(field.value, indent + 4);
            }
        },
    }
}

pub fn main() !void {
    const input =
        \\{
        \\  "name": "Zig",
        \\  "year": 2016,
        \\  "safe": true,
        \\  "features": ["manual memory", "comptime", "c interop"],
        \\  "nothing": null
        \\}
    ;

    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();

    const value = try parseJson(arena.allocator(), input);
    printValue(value, 0);
}

Run:

zig build run

Expected output:

object
  key: name
    string: Zig
  key: year
    number: 2016
  key: safe
    bool: true
  key: features
    array
      string: manual memory
      string: comptime
      string: c interop
  key: nothing
    null

Why an Arena Allocator Works Well Here

The parser allocates arrays and object fields while parsing.

Those allocations all have the same lifetime. Once we finish using the parsed JSON tree, we can free everything together.

That is why this program uses an arena:

var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena.deinit();

The parser does not free each array separately. The arena frees all allocated memory at once when arena.deinit() runs.

This is a common pattern for parsers, compilers, and short-lived data trees.

What This Parser Does Not Handle Yet

This is not a complete JSON parser.

It does not fully handle string escapes.

It does not fully validate number syntax.

It does not produce detailed error locations.

It does not free nested allocations individually.

It does not support streaming input.

Those are normal next steps. The important part is that the basic architecture is now visible.

A parser usually has this shape:

input text
current index
peek
advance
skip whitespace
parse one value
parse nested values recursively
return structured data

Once you understand that shape, larger parsers become much less mysterious.