Skip to content

Build a Small Database

A database stores data so it can be saved, searched, updated, and loaded again later.

A database stores data so it can be saved, searched, updated, and loaded again later.

In this project, we will build a very small database engine using a single file.

The database will support:

insert
get
list
save to disk
load from disk

We will store records like this:

id
name
email

Example:

This project teaches several important systems programming ideas:

binary file formats
serialization
deserialization
fixed-size records
file I/O
memory layout

The database will stay intentionally small. We will not build SQL, indexing, transactions, or concurrency yet. The goal is to understand the core structure first.

The Shape of a Record

We need a fixed-size record structure.

const Record = struct {
    id: u32,
    name: [32]u8,
    email: [64]u8,
};

This record has:

4 bytes   id
32 bytes  name
64 bytes  email

Total:

100 bytes

Fixed-size records simplify storage because every record occupies the same amount of space.

Record 0 starts at byte 0.

Record 1 starts at byte 100.

Record 2 starts at byte 200.

And so on.

Why Fixed-Size Fields

Strings in Zig are usually slices:

[]const u8

But slices are not ideal for binary storage because they contain pointers.

Pointers only make sense inside one running process. If you write them to disk and reload later, the addresses are invalid.

So we use fixed arrays instead:

[32]u8

That stores the bytes directly inside the record.

Converting Strings Into Fixed Buffers

We need helper functions.

Add this:

fn copyString(dest: []u8, src: []const u8) void {
    @memset(dest, 0);

    const len = @min(dest.len, src.len);
    @memcpy(dest[0..len], src[0..len]);
}

This function:

  1. clears the destination
  2. copies as many bytes as fit

Now we can create records safely.

Create Records

Add this helper:

fn makeRecord(id: u32, name: []const u8, email: []const u8) Record {
    var record = Record{
        .id = id,
        .name = undefined,
        .email = undefined,
    };

    copyString(&record.name, name);
    copyString(&record.email, email);

    return record;
}

Example:

const user = makeRecord(
    1,
    "alice",
    "[email protected]",
);

The Database Type

Now define the database itself.

const Database = struct {
    allocator: std.mem.Allocator,
    records: std.ArrayList(Record),
};

This database keeps all records in memory using an ArrayList.

Later, we will save and load them from disk.

Initialize and Destroy

Add:

fn init(allocator: std.mem.Allocator) Database {
    return .{
        .allocator = allocator,
        .records = std.ArrayList(Record).init(allocator),
    };
}

fn deinit(self: *Database) void {
    self.records.deinit();
}

Insert Records

Add:

fn insert(self: *Database, record: Record) !void {
    try self.records.append(record);
}

Simple databases often start like this: append records into a collection.

Get Records by ID

Add:

fn get(self: *Database, id: u32) ?Record {
    for (self.records.items) |record| {
        if (record.id == id) {
            return record;
        }
    }

    return null;
}

This is a linear search.

For a tiny database, that is fine.

Larger databases use indexes like B-trees or hash tables.

Print Records

We need a helper that converts fixed arrays into printable strings.

fn trimNulls(bytes: []const u8) []const u8 {
    const end = std.mem.indexOfScalar(u8, bytes, 0) orelse bytes.len;
    return bytes[0..end];
}

Now add:

fn printRecord(record: Record) void {
    std.debug.print(
        "id={d} name={s} email={s}\n",
        .{
            record.id,
            trimNulls(&record.name),
            trimNulls(&record.email),
        },
    );
}

Try the Database

Put this in main:

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    var db = Database.init(allocator);
    defer db.deinit();

    try db.insert(makeRecord(
        1,
        "alice",
        "[email protected]",
    ));

    try db.insert(makeRecord(
        2,
        "bob",
        "[email protected]",
    ));

    for (db.records.items) |record| {
        printRecord(record);
    }
}

Run:

zig build run

Output:

id=1 name=alice [email protected]
id=2 name=bob [email protected]

Now we have an in-memory database.

Saving to Disk

A database becomes useful when data survives after the program exits.

We will save records directly as binary data.

Add this method:

fn save(self: *Database, path: []const u8) !void {
    const file = try std.fs.cwd().createFile(path, .{});
    defer file.close();

    for (self.records.items) |record| {
        try file.writeAll(std.mem.asBytes(&record));
    }
}

This line is important:

std.mem.asBytes(&record)

It views the record struct as raw bytes.

Since Record contains only fixed-size fields, this works cleanly.

Each record becomes exactly 100 bytes on disk.

Loading from Disk

Now add a load function:

fn load(self: *Database, path: []const u8) !void {
    const file = try std.fs.cwd().openFile(path, .{});
    defer file.close();

    while (true) {
        var record: Record = undefined;

        const bytes = std.mem.asBytes(&record);

        const n = try file.read(bytes);

        if (n == 0) {
            break;
        }

        if (n != bytes.len) {
            return error.InvalidDatabaseFile;
        }

        try self.records.append(record);
    }
}

This repeatedly reads exactly one Record from the file.

If the file ends cleanly, reading returns 0.

If the file stops halfway through a record, the file is corrupted.

Save and Reload

Update main:

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    {
        var db = Database.init(allocator);
        defer db.deinit();

        try db.insert(makeRecord(
            1,
            "alice",
            "[email protected]",
        ));

        try db.insert(makeRecord(
            2,
            "bob",
            "[email protected]",
        ));

        try db.save("users.db");
    }

    {
        var db = Database.init(allocator);
        defer db.deinit();

        try db.load("users.db");

        for (db.records.items) |record| {
            printRecord(record);
        }
    }
}

Run:

zig build run

You should still see:

id=1 name=alice [email protected]
id=2 name=bob [email protected]

But now the data was loaded from disk.

Inspect the Database File

The file is binary, not text.

Try:

hexdump -C users.db

You may see something like:

00000000  01 00 00 00 61 6c 69 63 65 ...

The bytes contain:

record id
name bytes
email bytes

This is the core idea of binary storage: data is stored in a fixed machine-readable format.

Add Commands

Now let us turn this into a small interactive database.

Add:

const Command = enum {
    insert,
    get,
    list,
    quit,
};

We can parse user input like:

insert 1 alice [email protected]
get 1
list
quit

Reading User Input

Add this loop:

var stdin_buffer: [1024]u8 = undefined;
var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);

const stdin = &stdin_reader.interface;

while (true) {
    std.debug.print("db> ", .{});

    var line_buffer: [1024]u8 = undefined;

    const line = (try stdin.takeDelimiterExclusive(
        '\n',
        &line_buffer,
    )) orelse break;

    if (std.mem.eql(u8, line, "quit")) {
        break;
    }

    std.debug.print("command: {s}\n", .{line});
}

Run:

zig build run

Now you can type commands interactively.

Complete Database Program

Here is a compact complete version:

const std = @import("std");

const Record = struct {
    id: u32,
    name: [32]u8,
    email: [64]u8,
};

const Database = struct {
    allocator: std.mem.Allocator,
    records: std.ArrayList(Record),

    fn init(allocator: std.mem.Allocator) Database {
        return .{
            .allocator = allocator,
            .records = std.ArrayList(Record).init(allocator),
        };
    }

    fn deinit(self: *Database) void {
        self.records.deinit();
    }

    fn insert(self: *Database, record: Record) !void {
        try self.records.append(record);
    }

    fn get(self: *Database, id: u32) ?Record {
        for (self.records.items) |record| {
            if (record.id == id) {
                return record;
            }
        }

        return null;
    }

    fn save(self: *Database, path: []const u8) !void {
        const file = try std.fs.cwd().createFile(path, .{});
        defer file.close();

        for (self.records.items) |record| {
            try file.writeAll(std.mem.asBytes(&record));
        }
    }

    fn load(self: *Database, path: []const u8) !void {
        const file = try std.fs.cwd().openFile(path, .{}) catch |err| {
            switch (err) {
                error.FileNotFound => return,
                else => return err,
            }
        };
        defer file.close();

        while (true) {
            var record: Record = undefined;

            const bytes = std.mem.asBytes(&record);

            const n = try file.read(bytes);

            if (n == 0) {
                break;
            }

            if (n != bytes.len) {
                return error.InvalidDatabaseFile;
            }

            try self.records.append(record);
        }
    }
};

fn copyString(dest: []u8, src: []const u8) void {
    @memset(dest, 0);

    const len = @min(dest.len, src.len);
    @memcpy(dest[0..len], src[0..len]);
}

fn makeRecord(id: u32, name: []const u8, email: []const u8) Record {
    var record = Record{
        .id = id,
        .name = undefined,
        .email = undefined,
    };

    copyString(&record.name, name);
    copyString(&record.email, email);

    return record;
}

fn trimNulls(bytes: []const u8) []const u8 {
    const end = std.mem.indexOfScalar(u8, bytes, 0) orelse bytes.len;
    return bytes[0..end];
}

fn printRecord(record: Record) void {
    std.debug.print(
        "id={d} name={s} email={s}\n",
        .{
            record.id,
            trimNulls(&record.name),
            trimNulls(&record.email),
        },
    );
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    var db = Database.init(allocator);
    defer db.deinit();

    try db.load("users.db");
    defer db.save("users.db") catch {};

    var stdin_buffer: [1024]u8 = undefined;
    var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);

    const stdin = &stdin_reader.interface;

    while (true) {
        std.debug.print("db> ", .{});

        var line_buffer: [1024]u8 = undefined;

        const line = (try stdin.takeDelimiterExclusive(
            '\n',
            &line_buffer,
        )) orelse break;

        var parts = std.mem.splitScalar(u8, line, ' ');

        const command = parts.next() orelse continue;

        if (std.mem.eql(u8, command, "quit")) {
            break;
        } else if (std.mem.eql(u8, command, "list")) {
            for (db.records.items) |record| {
                printRecord(record);
            }
        } else if (std.mem.eql(u8, command, "get")) {
            const id_text = parts.next() orelse {
                std.debug.print("missing id\n", .{});
                continue;
            };

            const id = std.fmt.parseInt(u32, id_text, 10) catch {
                std.debug.print("invalid id\n", .{});
                continue;
            };

            if (db.get(id)) |record| {
                printRecord(record);
            } else {
                std.debug.print("not found\n", .{});
            }
        } else if (std.mem.eql(u8, command, "insert")) {
            const id_text = parts.next() orelse continue;
            const name = parts.next() orelse continue;
            const email = parts.next() orelse continue;

            const id = std.fmt.parseInt(u32, id_text, 10) catch {
                std.debug.print("invalid id\n", .{});
                continue;
            };

            try db.insert(makeRecord(id, name, email));

            std.debug.print("inserted\n", .{});
        } else {
            std.debug.print("unknown command\n", .{});
        }
    }
}

Example session:

db> insert 1 alice [email protected]
inserted

db> insert 2 bob [email protected]
inserted

db> list
id=1 name=alice [email protected]
id=2 name=bob [email protected]

db> get 1
id=1 name=alice [email protected]

db> quit

Why This Is Still a Real Database

Even though this project is small, it already contains core database ideas:

structured records
binary serialization
persistent storage
queries
table scanning
fixed layouts

Larger databases add:

indexes
transactions
caching
query planners
locking
concurrency
compression
journals
B-trees

But the foundation is still data stored in a structured format.

Limitations of This Design

This database rewrites the entire file on exit.

That is simple but inefficient.

The get operation scans every record linearly.

Deleted records are not supported.

Strings have fixed limits:

name  -> 32 bytes
email -> 64 bytes

The file format depends on machine layout and endianness.

A portable database format would define exact byte ordering explicitly.

These limitations are normal for a first database engine.

What You Learned

You built a small persistent database.

You designed a binary record format.

You serialized structs into bytes.

You loaded structs from disk.

You implemented insert, get, and list.

You built an interactive command loop.

You saw why fixed-size layouts simplify storage.

This is one of the most important systems programming ideas: structured memory layouts can become structured storage layouts.