Data-Oriented Design

Data-oriented design means you organize a program around the data it processes.

The main question is not “Which objects do I need?”

The main question is:

What data do I have, and how will the program move through it?

This style fits Zig well because Zig gives you direct control over memory layout, allocation, arrays, slices, structs, and pointers.

Start with the Data

Suppose you are writing a small game simulation.

A player has a position, velocity, health, and active flag.

A simple struct might look like this:

const Entity = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
    health: i32,
    active: bool,
};

This is clear. One entity has all its data in one place.

You can store many entities in an array:

var entities = [_]Entity{
    .{ .x = 0, .y = 0, .vx = 1, .vy = 0, .health = 100, .active = true },
    .{ .x = 5, .y = 2, .vx = 0, .vy = 1, .health = 80, .active = true },
};

Then update them:

for (&entities) |*entity| {
    if (entity.active) {
        entity.x += entity.vx;
        entity.y += entity.vy;
    }
}

This code walks through the data and updates every active entity.

Think About Access Patterns

Data-oriented design asks: which fields are used together?

In the update loop above, we use:

x
y
vx
vy
active

We do not use:

health

That matters.

If you have millions of entities, reading unused fields wastes memory bandwidth. The CPU loads nearby memory into cache. If each entity contains fields the loop does not need, the cache carries extra data.

For small programs, this does not matter much. For performance-sensitive code, it can matter a lot.

Array of Structs

The first design is called an array of structs.

const Entity = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
    health: i32,
    active: bool,
};

var entities: [1000]Entity = undefined;

The memory is shaped like this:

Entity 0: x y vx vy health active
Entity 1: x y vx vy health active
Entity 2: x y vx vy health active
...

This is easy to understand. Each entity is one complete value.

Array of structs is a good default when the data set is small or when you usually need most fields together.

Struct of Arrays

A different design is called a struct of arrays.

const Entities = struct {
    x: []f32,
    y: []f32,
    vx: []f32,
    vy: []f32,
    health: []i32,
    active: []bool,
};

The memory is shaped like this:

x:      x0 x1 x2 x3 ...
y:      y0 y1 y2 y3 ...
vx:     vx0 vx1 vx2 vx3 ...
vy:     vy0 vy1 vy2 vy3 ...
health: h0 h1 h2 h3 ...
active: a0 a1 a2 a3 ...

Now an update loop can touch only the arrays it needs:

fn update(entities: Entities) void {
    for (
        entities.x,
        entities.y,
        entities.vx,
        entities.vy,
        entities.active,
    ) |*x, *y, vx, vy, active| {
        if (active) {
            x.* += vx;
            y.* += vy;
        }
    }
}

The loop does not read health.

This layout can be faster for large batches of similar data.

The Tradeoff

Array of structs is simpler.

Struct of arrays can be faster for batch processing.

Layout	Shape	Good for
Array of structs	Each item stores all fields together	Simple modeling, small data, code clarity
Struct of arrays	Each field has its own array	Large batches, hot loops, cache-friendly processing

Do not start every program with a complicated layout. Start with clear data. Change the layout when measurement or scale shows that it matters.

Data-Oriented Design Is Not Just Speed

Performance is one reason to use data-oriented design, but the deeper idea is clarity.

You ask precise questions:

What data exists?

Which code reads it?

Which code writes it?

Which fields are used together?

Which fields are cold and rarely touched?

Which data must be contiguous?

Which data needs stable addresses?

These questions often produce simpler code.

Hot Data and Cold Data

Hot data is used often.

Cold data is used rarely.

For an entity, position and velocity may be hot:

const Motion = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
};

Name and description may be cold:

const Metadata = struct {
    name: []const u8,
    description: []const u8,
};

Instead of storing everything together, you can split the data:

const EntityStore = struct {
    motion: []Motion,
    health: []i32,
    metadata: []Metadata,
};

Now update code can process motion without touching metadata.

fn updateMotion(motion: []Motion) void {
    for (motion) |*m| {
        m.x += m.vx;
        m.y += m.vy;
    }
}

This is data-oriented thinking: put frequently used data where the CPU can walk through it efficiently.

Use Slices to Express Batches

Zig slices are useful for data-oriented code.

A slice says: here is a contiguous range of values.

fn updateMotion(motion: []Motion) void {
    for (motion) |*m| {
        m.x += m.vx;
        m.y += m.vy;
    }
}

This function does not care where the data came from. It only needs a slice of Motion.

You can pass a whole array:

updateMotion(entities.motion);

Or a subsection:

updateMotion(entities.motion[0..100]);

Slices make batch processing natural.

Avoid Hidden Allocation

Data-oriented code usually makes allocation explicit.

Instead of allocating inside every small operation, allocate storage once and reuse it.

const EntityStore = struct {
    motion: []Motion,
    health: []i32,

    pub fn init(allocator: std.mem.Allocator, count: usize) !EntityStore {
        return .{
            .motion = try allocator.alloc(Motion, count),
            .health = try allocator.alloc(i32, count),
        };
    }

    pub fn deinit(self: EntityStore, allocator: std.mem.Allocator) void {
        allocator.free(self.motion);
        allocator.free(self.health);
    }
};

The caller controls the allocator.

The store owns two arrays.

The deinit function releases them.

This is common Zig style: ownership and allocation are visible.

A Complete Example

const std = @import("std");

const Motion = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
};

const EntityStore = struct {
    motion: []Motion,
    health: []i32,

    pub fn init(allocator: std.mem.Allocator, count: usize) !EntityStore {
        return .{
            .motion = try allocator.alloc(Motion, count),
            .health = try allocator.alloc(i32, count),
        };
    }

    pub fn deinit(self: EntityStore, allocator: std.mem.Allocator) void {
        allocator.free(self.motion);
        allocator.free(self.health);
    }

    pub fn updateMotion(self: EntityStore) void {
        for (self.motion) |*m| {
            m.x += m.vx;
            m.y += m.vy;
        }
    }
};

pub fn main() !void {
    const allocator = std.heap.page_allocator;

    var store = try EntityStore.init(allocator, 2);
    defer store.deinit(allocator);

    store.motion[0] = .{ .x = 0, .y = 0, .vx = 1, .vy = 0 };
    store.motion[1] = .{ .x = 5, .y = 2, .vx = 0, .vy = 1 };

    store.health[0] = 100;
    store.health[1] = 80;

    store.updateMotion();

    std.debug.print("entity 0 position = ({}, {})\n", .{
        store.motion[0].x,
        store.motion[0].y,
    });
}

Output:

entity 0 position = (1, 0)

This example separates motion from health. The update loop only touches motion data.

Data-Oriented APIs

A data-oriented function should usually accept the data it needs, not a giant object containing everything.

Less precise:

fn updateWorld(world: *World) void {
    // uses only world.motion
}

More precise:

fn updateMotion(motion: []Motion) void {
    for (motion) |*m| {
        m.x += m.vx;
        m.y += m.vy;
    }
}

The second version is easier to test. It also shows exactly what data the function uses.

This is a good habit in Zig: make data dependencies visible.

Data-Oriented Design and Structs

Structs still matter in data-oriented design.

You use structs to group fields that are commonly processed together.

const Motion = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
};

These fields belong together because motion update uses all of them.

But you may not group unrelated fields:

const Entity = struct {
    motion: Motion,
    health: i32,
    name: []const u8,
    debug_label: []const u8,
};

This may be fine for simple code. But if motion is updated millions of times per second and names are rarely used, splitting them can help.

The question is always: which data is used together?

Data-Oriented Design and Cache

Modern CPUs are fast, but memory access is relatively slow.

When a CPU reads memory, it usually brings a whole cache line into cache, not just one field.

If your loop walks through contiguous useful data, the CPU works well.

If your loop jumps around memory or pulls in many unused fields, it may waste time.

This is why arrays and slices matter:

for (items) |item| {
    // contiguous walk through memory
}

Zig makes this style clear because slices directly represent contiguous memory.

Do Not Overcomplicate Early

Data-oriented design can become too complicated if you apply it blindly.

This is usually fine:

const User = struct {
    id: u64,
    name: []const u8,
    email: []const u8,
};

You do not need a separate array for every field in a small application.

Use the simple design first when it is clear and fast enough.

Move toward data-oriented layouts when you have:

large arrays
hot loops
repeated batch processing
memory bandwidth problems
cache misses
performance measurements

The goal is better structure, not cleverness.

The Main Idea

Data-oriented design means arranging code around the data and the way it is processed.

In Zig, this usually means:

use structs to group data that belongs together
use arrays and slices for batches
keep allocation explicit
split hot data from cold data when needed
write functions that take the data they actually use
measure before making the layout complicated

A normal struct is often the right start:

const Entity = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
    health: i32,
};

When performance or clarity demands it, reshape the data:

const EntityStore = struct {
    motion: []Motion,
    health: []i32,
};

Good Zig code often starts with this question: what shape should the data have so the program can do its work clearly and efficiently?