A database stores data so it can be saved, searched, updated, and loaded again later.
A database stores data so it can be saved, searched, updated, and loaded again later.
In this project, we will build a very small database engine using a single file.
The database will support:
insert
get
list
save to disk
load from diskWe will store records like this:
id
name
emailExample:
1 alice [email protected]
2 bob [email protected]This project teaches several important systems programming ideas:
binary file formats
serialization
deserialization
fixed-size records
file I/O
memory layoutThe database will stay intentionally small. We will not build SQL, indexing, transactions, or concurrency yet. The goal is to understand the core structure first.
The Shape of a Record
We need a fixed-size record structure.
const Record = struct {
id: u32,
name: [32]u8,
email: [64]u8,
};This record has:
4 bytes id
32 bytes name
64 bytes emailTotal:
100 bytesFixed-size records simplify storage because every record occupies the same amount of space.
Record 0 starts at byte 0.
Record 1 starts at byte 100.
Record 2 starts at byte 200.
And so on.
Why Fixed-Size Fields
Strings in Zig are usually slices:
[]const u8But slices are not ideal for binary storage because they contain pointers.
Pointers only make sense inside one running process. If you write them to disk and reload later, the addresses are invalid.
So we use fixed arrays instead:
[32]u8That stores the bytes directly inside the record.
Converting Strings Into Fixed Buffers
We need helper functions.
Add this:
fn copyString(dest: []u8, src: []const u8) void {
@memset(dest, 0);
const len = @min(dest.len, src.len);
@memcpy(dest[0..len], src[0..len]);
}This function:
- clears the destination
- copies as many bytes as fit
Now we can create records safely.
Create Records
Add this helper:
fn makeRecord(id: u32, name: []const u8, email: []const u8) Record {
var record = Record{
.id = id,
.name = undefined,
.email = undefined,
};
copyString(&record.name, name);
copyString(&record.email, email);
return record;
}Example:
const user = makeRecord(
1,
"alice",
"[email protected]",
);The Database Type
Now define the database itself.
const Database = struct {
allocator: std.mem.Allocator,
records: std.ArrayList(Record),
};This database keeps all records in memory using an ArrayList.
Later, we will save and load them from disk.
Initialize and Destroy
Add:
fn init(allocator: std.mem.Allocator) Database {
return .{
.allocator = allocator,
.records = std.ArrayList(Record).init(allocator),
};
}
fn deinit(self: *Database) void {
self.records.deinit();
}Insert Records
Add:
fn insert(self: *Database, record: Record) !void {
try self.records.append(record);
}Simple databases often start like this: append records into a collection.
Get Records by ID
Add:
fn get(self: *Database, id: u32) ?Record {
for (self.records.items) |record| {
if (record.id == id) {
return record;
}
}
return null;
}This is a linear search.
For a tiny database, that is fine.
Larger databases use indexes like B-trees or hash tables.
Print Records
We need a helper that converts fixed arrays into printable strings.
fn trimNulls(bytes: []const u8) []const u8 {
const end = std.mem.indexOfScalar(u8, bytes, 0) orelse bytes.len;
return bytes[0..end];
}Now add:
fn printRecord(record: Record) void {
std.debug.print(
"id={d} name={s} email={s}\n",
.{
record.id,
trimNulls(&record.name),
trimNulls(&record.email),
},
);
}Try the Database
Put this in main:
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var db = Database.init(allocator);
defer db.deinit();
try db.insert(makeRecord(
1,
"alice",
"[email protected]",
));
try db.insert(makeRecord(
2,
"bob",
"[email protected]",
));
for (db.records.items) |record| {
printRecord(record);
}
}Run:
zig build runOutput:
id=1 name=alice [email protected]
id=2 name=bob [email protected]Now we have an in-memory database.
Saving to Disk
A database becomes useful when data survives after the program exits.
We will save records directly as binary data.
Add this method:
fn save(self: *Database, path: []const u8) !void {
const file = try std.fs.cwd().createFile(path, .{});
defer file.close();
for (self.records.items) |record| {
try file.writeAll(std.mem.asBytes(&record));
}
}This line is important:
std.mem.asBytes(&record)It views the record struct as raw bytes.
Since Record contains only fixed-size fields, this works cleanly.
Each record becomes exactly 100 bytes on disk.
Loading from Disk
Now add a load function:
fn load(self: *Database, path: []const u8) !void {
const file = try std.fs.cwd().openFile(path, .{});
defer file.close();
while (true) {
var record: Record = undefined;
const bytes = std.mem.asBytes(&record);
const n = try file.read(bytes);
if (n == 0) {
break;
}
if (n != bytes.len) {
return error.InvalidDatabaseFile;
}
try self.records.append(record);
}
}This repeatedly reads exactly one Record from the file.
If the file ends cleanly, reading returns 0.
If the file stops halfway through a record, the file is corrupted.
Save and Reload
Update main:
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
{
var db = Database.init(allocator);
defer db.deinit();
try db.insert(makeRecord(
1,
"alice",
"[email protected]",
));
try db.insert(makeRecord(
2,
"bob",
"[email protected]",
));
try db.save("users.db");
}
{
var db = Database.init(allocator);
defer db.deinit();
try db.load("users.db");
for (db.records.items) |record| {
printRecord(record);
}
}
}Run:
zig build runYou should still see:
id=1 name=alice [email protected]
id=2 name=bob [email protected]But now the data was loaded from disk.
Inspect the Database File
The file is binary, not text.
Try:
hexdump -C users.dbYou may see something like:
00000000 01 00 00 00 61 6c 69 63 65 ...The bytes contain:
record id
name bytes
email bytesThis is the core idea of binary storage: data is stored in a fixed machine-readable format.
Add Commands
Now let us turn this into a small interactive database.
Add:
const Command = enum {
insert,
get,
list,
quit,
};We can parse user input like:
insert 1 alice [email protected]
get 1
list
quitReading User Input
Add this loop:
var stdin_buffer: [1024]u8 = undefined;
var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);
const stdin = &stdin_reader.interface;
while (true) {
std.debug.print("db> ", .{});
var line_buffer: [1024]u8 = undefined;
const line = (try stdin.takeDelimiterExclusive(
'\n',
&line_buffer,
)) orelse break;
if (std.mem.eql(u8, line, "quit")) {
break;
}
std.debug.print("command: {s}\n", .{line});
}Run:
zig build runNow you can type commands interactively.
Complete Database Program
Here is a compact complete version:
const std = @import("std");
const Record = struct {
id: u32,
name: [32]u8,
email: [64]u8,
};
const Database = struct {
allocator: std.mem.Allocator,
records: std.ArrayList(Record),
fn init(allocator: std.mem.Allocator) Database {
return .{
.allocator = allocator,
.records = std.ArrayList(Record).init(allocator),
};
}
fn deinit(self: *Database) void {
self.records.deinit();
}
fn insert(self: *Database, record: Record) !void {
try self.records.append(record);
}
fn get(self: *Database, id: u32) ?Record {
for (self.records.items) |record| {
if (record.id == id) {
return record;
}
}
return null;
}
fn save(self: *Database, path: []const u8) !void {
const file = try std.fs.cwd().createFile(path, .{});
defer file.close();
for (self.records.items) |record| {
try file.writeAll(std.mem.asBytes(&record));
}
}
fn load(self: *Database, path: []const u8) !void {
const file = try std.fs.cwd().openFile(path, .{}) catch |err| {
switch (err) {
error.FileNotFound => return,
else => return err,
}
};
defer file.close();
while (true) {
var record: Record = undefined;
const bytes = std.mem.asBytes(&record);
const n = try file.read(bytes);
if (n == 0) {
break;
}
if (n != bytes.len) {
return error.InvalidDatabaseFile;
}
try self.records.append(record);
}
}
};
fn copyString(dest: []u8, src: []const u8) void {
@memset(dest, 0);
const len = @min(dest.len, src.len);
@memcpy(dest[0..len], src[0..len]);
}
fn makeRecord(id: u32, name: []const u8, email: []const u8) Record {
var record = Record{
.id = id,
.name = undefined,
.email = undefined,
};
copyString(&record.name, name);
copyString(&record.email, email);
return record;
}
fn trimNulls(bytes: []const u8) []const u8 {
const end = std.mem.indexOfScalar(u8, bytes, 0) orelse bytes.len;
return bytes[0..end];
}
fn printRecord(record: Record) void {
std.debug.print(
"id={d} name={s} email={s}\n",
.{
record.id,
trimNulls(&record.name),
trimNulls(&record.email),
},
);
}
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var db = Database.init(allocator);
defer db.deinit();
try db.load("users.db");
defer db.save("users.db") catch {};
var stdin_buffer: [1024]u8 = undefined;
var stdin_reader = std.fs.File.stdin().reader(&stdin_buffer);
const stdin = &stdin_reader.interface;
while (true) {
std.debug.print("db> ", .{});
var line_buffer: [1024]u8 = undefined;
const line = (try stdin.takeDelimiterExclusive(
'\n',
&line_buffer,
)) orelse break;
var parts = std.mem.splitScalar(u8, line, ' ');
const command = parts.next() orelse continue;
if (std.mem.eql(u8, command, "quit")) {
break;
} else if (std.mem.eql(u8, command, "list")) {
for (db.records.items) |record| {
printRecord(record);
}
} else if (std.mem.eql(u8, command, "get")) {
const id_text = parts.next() orelse {
std.debug.print("missing id\n", .{});
continue;
};
const id = std.fmt.parseInt(u32, id_text, 10) catch {
std.debug.print("invalid id\n", .{});
continue;
};
if (db.get(id)) |record| {
printRecord(record);
} else {
std.debug.print("not found\n", .{});
}
} else if (std.mem.eql(u8, command, "insert")) {
const id_text = parts.next() orelse continue;
const name = parts.next() orelse continue;
const email = parts.next() orelse continue;
const id = std.fmt.parseInt(u32, id_text, 10) catch {
std.debug.print("invalid id\n", .{});
continue;
};
try db.insert(makeRecord(id, name, email));
std.debug.print("inserted\n", .{});
} else {
std.debug.print("unknown command\n", .{});
}
}
}Example session:
db> insert 1 alice [email protected]
inserted
db> insert 2 bob [email protected]
inserted
db> list
id=1 name=alice [email protected]
id=2 name=bob [email protected]
db> get 1
id=1 name=alice [email protected]
db> quitWhy This Is Still a Real Database
Even though this project is small, it already contains core database ideas:
structured records
binary serialization
persistent storage
queries
table scanning
fixed layoutsLarger databases add:
indexes
transactions
caching
query planners
locking
concurrency
compression
journals
B-treesBut the foundation is still data stored in a structured format.
Limitations of This Design
This database rewrites the entire file on exit.
That is simple but inefficient.
The get operation scans every record linearly.
Deleted records are not supported.
Strings have fixed limits:
name -> 32 bytes
email -> 64 bytesThe file format depends on machine layout and endianness.
A portable database format would define exact byte ordering explicitly.
These limitations are normal for a first database engine.
What You Learned
You built a small persistent database.
You designed a binary record format.
You serialized structs into bytes.
You loaded structs from disk.
You implemented insert, get, and list.
You built an interactive command loop.
You saw why fixed-size layouts simplify storage.
This is one of the most important systems programming ideas: structured memory layouts can become structured storage layouts.