# Build a Small Database

### Build a Small Database

A database stores data so it can be saved, searched, updated, and loaded again later.

In this project, we will build a very small database engine using a single file.

The database will support:

```text
insert
get
list
save to disk
load from disk
```

We will store records like this:

```text
id
name
email
```

Example:

```text
1 alice alice@example.com
2 bob bob@example.com
```

This project teaches several important systems programming ideas:

```text
binary file formats
serialization
deserialization
fixed-size records
file I/O
memory layout
```

The database will stay intentionally small. We will not build SQL, indexing, transactions, or concurrency yet. The goal is to understand the core structure first.

#### The Shape of a Record

We need a fixed-size record structure.

```zig
const Record = struct {
    id: u32,
    name: [32]u8,
    email: [64]u8,
};
```

This record has:

```text
4 bytes   id
32 bytes  name
64 bytes  email
```

Total:

```text
100 bytes
```

Fixed-size records simplify storage because every record occupies the same amount of space.

Record 0 starts at byte 0.

Record 1 starts at byte 100.

Record 2 starts at byte 200.

And so on.

#### Why Fixed-Size Fields

Strings in Zig are usually slices:

```zig
[]const u8
```

But slices are not ideal for binary storage because they contain pointers.

Pointers only make sense inside one running process. If you write them to disk and reload later, the addresses are invalid.

So we use fixed arrays instead:

```zig
[32]u8
```

That stores the bytes directly inside the record.

#### Converting Strings Into Fixed Buffers

We need helper functions.

Add this:

```zig
fn copyString(dest: []u8, src: []const u8) void {
    @memset(dest, 0);

    const len = @min(dest.len, src.len);
    @memcpy(dest[0..len], src[0..len]);
}
```

This function:

1. clears the destination
2. copies as many bytes as fit

Now we can create records safely.

#### Create Records

Add this helper:

```zig
fn makeRecord(id: u32, name: []const u8, email: []const u8) Record {
    var record = Record{
        .id = id,
        .name = undefined,
        .email = undefined,
    };

    copyString(&record.name, name);
    copyString(&record.email, email);

    return record;
}
```

Example:

```zig
const user = makeRecord(
    1,
    "alice",
    "alice@example.com",
);
```

#### The Database Type

Now define the database itself.

```zig
const Database = struct {
    allocator: std.mem.Allocator,
    records: std.ArrayList(Record),
};
```

This database keeps all records in memory using an `ArrayList`.

Later, we will save and load them from disk.

#### Initialize and Destroy

Add:

```zig
fn init(allocator: std.mem.Allocator) Database {
    return .{
        .allocator = allocator,
        .records = std.ArrayList(Record).init(allocator),
    };
}

fn deinit(self: *Database) void {
    self.records.deinit();
}
```

#### Insert Records

Add:

```zig
fn insert(self: *Database, record: Record) !void {
    try self.records.append(record);
}
```

Simple databases often start like this: append records into a collection.

#### Get Records by ID

Add:

```zig
fn get(self: *Database, id: u32) ?Record {
    for (self.records.items) |record| {
        if (record.id == id) {
            return record;
        }
    }

    return null;
}
```

This is a linear search.

For a tiny database, that is fine.

Larger databases use indexes like B-trees or hash tables.

#### Print Records

We need a helper that converts fixed arrays into printable strings.

```zig
fn trimNulls(bytes: []const u8) []const u8 {
    const end = std.mem.indexOfScalar(u8, bytes, 0) orelse bytes.len;
    return bytes[0..end];
}
```

Now add:

```zig
fn printRecord(record: Record) void {
    std.debug.print(
        "id={d} name={s} email={s}\n",
        .{
            record.id,
            trimNulls(&record.name),
            trimNulls(&record.email),
        },
    );
}
```

#### Try the Database

Put this in `main`:

```zig
pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    var db = Database.init(allocator);
    defer db.deinit();

    try db.insert(makeRecord(
        1,
        "alice",
        "alice@example.com",
    ));

    try db.insert(makeRecord(
        2,
        "bob",
        "bob@example.com",
    ));

    for (db.records.items) |record| {
        printRecord(record);
    }
}
```

Run:

```bash
zig build run
```

Output:

```text
id=1 name=alice email=alice@example.com
id=2 name=bob email=bob@example.com
```

Now we have an in-memory database.

#### Saving to Disk

A database becomes useful when data survives after the program exits.

We will save records directly as binary data.

Add this method:

```zig
fn save(self: *Database, path: []const u8) !void {
    const file = try std.fs.cwd().createFile(path, .{});
    defer file.close();

    for (self.records.items) |record| {
        try file.writeAll(std.mem.asBytes(&record));
    }
}
```

This line is important:

```zig
std.mem.asBytes(&record)
```

It views the record struct as raw bytes.

Since `Record` contains only fixed-size fields, this works cleanly.

Each record becomes exactly 100 bytes on disk.

#### Loading from Disk

Now add a load function:

```zig
fn load(self: *Database, path: []const u8) !void {
    const file = try std.fs.cwd().openFile(path, .{});
    defer file.close();

    while (true) {
        var record: Record = undefined;

        const bytes = std.mem.asBytes(&record);

        const n = try file.read(bytes);

        if (n == 0) {
            break;
        }

        if (n != bytes.len) {
            return error.InvalidDatabaseFile;
        }

        try self.records.append(record);
    }
}
```

This repeatedly reads exactly one `Record` from the file.

If the file ends cleanly, reading returns 0.

If the file stops halfway through a record, the file is corrupted.

#### Save and Reload

Update `main`:

```zig
pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    {
        var db = Database.init(allocator);
        defer db.deinit();

        try db.insert(makeRecord(
            1,
            "alice",
            "alice@example.com",
        ));

        try db.insert(makeRecord(
            2,
            "bob",
            "bob@example.com",
        ));

        try db.save("users.db");
    }

    {
        var db = Database.init(allocator);
        defer db.deinit();

        try db.load("users.db");

        for (db.records.items) |record| {
            printRecord(record);
        }
    }
}
```

Run:

```bash
zig build run
```

You should still see:

```text
id=1 name=alice email=alice@example.com
id=2 name=bob email=bob@example.com
```

But now the data was loaded from disk.

#### Inspect the Database File

The file is binary, not text.

Try:

```bash
hexdump -C users.db
```

You may see something like:

```text
00000000  01 00 00 00 61 6c 69 63 65 ...
```

The bytes contain:

```text
record id
name bytes
email bytes
```

This is the core idea of binary storage: data is stored in a fixed machine-readable format.

#### Add Commands

Now let us turn this into a small interactive database.

Add:

```zig
const Command = enum {
    insert,
    get,
    list,
    quit,
};
```

We can parse user input like:

```text
insert 1 alice alice@example.com
get 1
list
quit
```

#### Reading User Input

Add this loop:

```zig
var stdin_buffer: [1024]u8 = undefined;
var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);

const stdin = &stdin_reader.interface;

while (true) {
    std.debug.print("db> ", .{});

    var line_buffer: [1024]u8 = undefined;

    const line = (try stdin.takeDelimiterExclusive(
        '\n',
        &line_buffer,
    )) orelse break;

    if (std.mem.eql(u8, line, "quit")) {
        break;
    }

    std.debug.print("command: {s}\n", .{line});
}
```

Run:

```bash
zig build run
```

Now you can type commands interactively.

#### Complete Database Program

Here is a compact complete version:

```zig
const std = @import("std");

const Record = struct {
    id: u32,
    name: [32]u8,
    email: [64]u8,
};

const Database = struct {
    allocator: std.mem.Allocator,
    records: std.ArrayList(Record),

    fn init(allocator: std.mem.Allocator) Database {
        return .{
            .allocator = allocator,
            .records = std.ArrayList(Record).init(allocator),
        };
    }

    fn deinit(self: *Database) void {
        self.records.deinit();
    }

    fn insert(self: *Database, record: Record) !void {
        try self.records.append(record);
    }

    fn get(self: *Database, id: u32) ?Record {
        for (self.records.items) |record| {
            if (record.id == id) {
                return record;
            }
        }

        return null;
    }

    fn save(self: *Database, path: []const u8) !void {
        const file = try std.fs.cwd().createFile(path, .{});
        defer file.close();

        for (self.records.items) |record| {
            try file.writeAll(std.mem.asBytes(&record));
        }
    }

    fn load(self: *Database, path: []const u8) !void {
        const file = try std.fs.cwd().openFile(path, .{}) catch |err| {
            switch (err) {
                error.FileNotFound => return,
                else => return err,
            }
        };
        defer file.close();

        while (true) {
            var record: Record = undefined;

            const bytes = std.mem.asBytes(&record);

            const n = try file.read(bytes);

            if (n == 0) {
                break;
            }

            if (n != bytes.len) {
                return error.InvalidDatabaseFile;
            }

            try self.records.append(record);
        }
    }
};

fn copyString(dest: []u8, src: []const u8) void {
    @memset(dest, 0);

    const len = @min(dest.len, src.len);
    @memcpy(dest[0..len], src[0..len]);
}

fn makeRecord(id: u32, name: []const u8, email: []const u8) Record {
    var record = Record{
        .id = id,
        .name = undefined,
        .email = undefined,
    };

    copyString(&record.name, name);
    copyString(&record.email, email);

    return record;
}

fn trimNulls(bytes: []const u8) []const u8 {
    const end = std.mem.indexOfScalar(u8, bytes, 0) orelse bytes.len;
    return bytes[0..end];
}

fn printRecord(record: Record) void {
    std.debug.print(
        "id={d} name={s} email={s}\n",
        .{
            record.id,
            trimNulls(&record.name),
            trimNulls(&record.email),
        },
    );
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    var db = Database.init(allocator);
    defer db.deinit();

    try db.load("users.db");
    defer db.save("users.db") catch {};

    var stdin_buffer: [1024]u8 = undefined;
    var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);

    const stdin = &stdin_reader.interface;

    while (true) {
        std.debug.print("db> ", .{});

        var line_buffer: [1024]u8 = undefined;

        const line = (try stdin.takeDelimiterExclusive(
            '\n',
            &line_buffer,
        )) orelse break;

        var parts = std.mem.splitScalar(u8, line, ' ');

        const command = parts.next() orelse continue;

        if (std.mem.eql(u8, command, "quit")) {
            break;
        } else if (std.mem.eql(u8, command, "list")) {
            for (db.records.items) |record| {
                printRecord(record);
            }
        } else if (std.mem.eql(u8, command, "get")) {
            const id_text = parts.next() orelse {
                std.debug.print("missing id\n", .{});
                continue;
            };

            const id = std.fmt.parseInt(u32, id_text, 10) catch {
                std.debug.print("invalid id\n", .{});
                continue;
            };

            if (db.get(id)) |record| {
                printRecord(record);
            } else {
                std.debug.print("not found\n", .{});
            }
        } else if (std.mem.eql(u8, command, "insert")) {
            const id_text = parts.next() orelse continue;
            const name = parts.next() orelse continue;
            const email = parts.next() orelse continue;

            const id = std.fmt.parseInt(u32, id_text, 10) catch {
                std.debug.print("invalid id\n", .{});
                continue;
            };

            try db.insert(makeRecord(id, name, email));

            std.debug.print("inserted\n", .{});
        } else {
            std.debug.print("unknown command\n", .{});
        }
    }
}
```

Example session:

```text
db> insert 1 alice alice@example.com
inserted

db> insert 2 bob bob@example.com
inserted

db> list
id=1 name=alice email=alice@example.com
id=2 name=bob email=bob@example.com

db> get 1
id=1 name=alice email=alice@example.com

db> quit
```

#### Why This Is Still a Real Database

Even though this project is small, it already contains core database ideas:

```text
structured records
binary serialization
persistent storage
queries
table scanning
fixed layouts
```

Larger databases add:

```text
indexes
transactions
caching
query planners
locking
concurrency
compression
journals
B-trees
```

But the foundation is still data stored in a structured format.

#### Limitations of This Design

This database rewrites the entire file on exit.

That is simple but inefficient.

The `get` operation scans every record linearly.

Deleted records are not supported.

Strings have fixed limits:

```text
name  -> 32 bytes
email -> 64 bytes
```

The file format depends on machine layout and endianness.

A portable database format would define exact byte ordering explicitly.

These limitations are normal for a first database engine.

#### What You Learned

You built a small persistent database.

You designed a binary record format.

You serialized structs into bytes.

You loaded structs from disk.

You implemented insert, get, and list.

You built an interactive command loop.

You saw why fixed-size layouts simplify storage.

This is one of the most important systems programming ideas: structured memory layouts can become structured storage layouts.