# Data-Oriented Design

### Data-Oriented Design

Data-oriented design means you organize a program around the data it processes.

The main question is not “Which objects do I need?”

The main question is:

```text
What data do I have, and how will the program move through it?
```

This style fits Zig well because Zig gives you direct control over memory layout, allocation, arrays, slices, structs, and pointers.

#### Start with the Data

Suppose you are writing a small game simulation.

A player has a position, velocity, health, and active flag.

A simple struct might look like this:

```zig
const Entity = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
    health: i32,
    active: bool,
};
```

This is clear. One entity has all its data in one place.

You can store many entities in an array:

```zig
var entities = [_]Entity{
    .{ .x = 0, .y = 0, .vx = 1, .vy = 0, .health = 100, .active = true },
    .{ .x = 5, .y = 2, .vx = 0, .vy = 1, .health = 80, .active = true },
};
```

Then update them:

```zig
for (&entities) |*entity| {
    if (entity.active) {
        entity.x += entity.vx;
        entity.y += entity.vy;
    }
}
```

This code walks through the data and updates every active entity.

#### Think About Access Patterns

Data-oriented design asks: which fields are used together?

In the update loop above, we use:

```text
x
y
vx
vy
active
```

We do not use:

```text
health
```

That matters.

If you have millions of entities, reading unused fields wastes memory bandwidth. The CPU loads nearby memory into cache. If each entity contains fields the loop does not need, the cache carries extra data.

For small programs, this does not matter much. For performance-sensitive code, it can matter a lot.

#### Array of Structs

The first design is called an array of structs.

```zig
const Entity = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
    health: i32,
    active: bool,
};

var entities: [1000]Entity = undefined;
```

The memory is shaped like this:

```text
Entity 0: x y vx vy health active
Entity 1: x y vx vy health active
Entity 2: x y vx vy health active
...
```

This is easy to understand. Each entity is one complete value.

Array of structs is a good default when the data set is small or when you usually need most fields together.

#### Struct of Arrays

A different design is called a struct of arrays.

```zig
const Entities = struct {
    x: []f32,
    y: []f32,
    vx: []f32,
    vy: []f32,
    health: []i32,
    active: []bool,
};
```

The memory is shaped like this:

```text
x:      x0 x1 x2 x3 ...
y:      y0 y1 y2 y3 ...
vx:     vx0 vx1 vx2 vx3 ...
vy:     vy0 vy1 vy2 vy3 ...
health: h0 h1 h2 h3 ...
active: a0 a1 a2 a3 ...
```

Now an update loop can touch only the arrays it needs:

```zig
fn update(entities: Entities) void {
    for (
        entities.x,
        entities.y,
        entities.vx,
        entities.vy,
        entities.active,
    ) |*x, *y, vx, vy, active| {
        if (active) {
            x.* += vx;
            y.* += vy;
        }
    }
}
```

The loop does not read `health`.

This layout can be faster for large batches of similar data.

#### The Tradeoff

Array of structs is simpler.

Struct of arrays can be faster for batch processing.

| Layout | Shape | Good for |
|---|---|---|
| Array of structs | Each item stores all fields together | Simple modeling, small data, code clarity |
| Struct of arrays | Each field has its own array | Large batches, hot loops, cache-friendly processing |

Do not start every program with a complicated layout. Start with clear data. Change the layout when measurement or scale shows that it matters.

#### Data-Oriented Design Is Not Just Speed

Performance is one reason to use data-oriented design, but the deeper idea is clarity.

You ask precise questions:

What data exists?

Which code reads it?

Which code writes it?

Which fields are used together?

Which fields are cold and rarely touched?

Which data must be contiguous?

Which data needs stable addresses?

These questions often produce simpler code.

#### Hot Data and Cold Data

Hot data is used often.

Cold data is used rarely.

For an entity, position and velocity may be hot:

```zig
const Motion = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
};
```

Name and description may be cold:

```zig
const Metadata = struct {
    name: []const u8,
    description: []const u8,
};
```

Instead of storing everything together, you can split the data:

```zig
const EntityStore = struct {
    motion: []Motion,
    health: []i32,
    metadata: []Metadata,
};
```

Now update code can process motion without touching metadata.

```zig
fn updateMotion(motion: []Motion) void {
    for (motion) |*m| {
        m.x += m.vx;
        m.y += m.vy;
    }
}
```

This is data-oriented thinking: put frequently used data where the CPU can walk through it efficiently.

#### Use Slices to Express Batches

Zig slices are useful for data-oriented code.

A slice says: here is a contiguous range of values.

```zig
fn updateMotion(motion: []Motion) void {
    for (motion) |*m| {
        m.x += m.vx;
        m.y += m.vy;
    }
}
```

This function does not care where the data came from. It only needs a slice of `Motion`.

You can pass a whole array:

```zig
updateMotion(entities.motion);
```

Or a subsection:

```zig
updateMotion(entities.motion[0..100]);
```

Slices make batch processing natural.

#### Avoid Hidden Allocation

Data-oriented code usually makes allocation explicit.

Instead of allocating inside every small operation, allocate storage once and reuse it.

```zig
const EntityStore = struct {
    motion: []Motion,
    health: []i32,

    pub fn init(allocator: std.mem.Allocator, count: usize) !EntityStore {
        return .{
            .motion = try allocator.alloc(Motion, count),
            .health = try allocator.alloc(i32, count),
        };
    }

    pub fn deinit(self: EntityStore, allocator: std.mem.Allocator) void {
        allocator.free(self.motion);
        allocator.free(self.health);
    }
};
```

The caller controls the allocator.

The store owns two arrays.

The `deinit` function releases them.

This is common Zig style: ownership and allocation are visible.

#### A Complete Example

```zig
const std = @import("std");

const Motion = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
};

const EntityStore = struct {
    motion: []Motion,
    health: []i32,

    pub fn init(allocator: std.mem.Allocator, count: usize) !EntityStore {
        return .{
            .motion = try allocator.alloc(Motion, count),
            .health = try allocator.alloc(i32, count),
        };
    }

    pub fn deinit(self: EntityStore, allocator: std.mem.Allocator) void {
        allocator.free(self.motion);
        allocator.free(self.health);
    }

    pub fn updateMotion(self: EntityStore) void {
        for (self.motion) |*m| {
            m.x += m.vx;
            m.y += m.vy;
        }
    }
};

pub fn main() !void {
    const allocator = std.heap.page_allocator;

    var store = try EntityStore.init(allocator, 2);
    defer store.deinit(allocator);

    store.motion[0] = .{ .x = 0, .y = 0, .vx = 1, .vy = 0 };
    store.motion[1] = .{ .x = 5, .y = 2, .vx = 0, .vy = 1 };

    store.health[0] = 100;
    store.health[1] = 80;

    store.updateMotion();

    std.debug.print("entity 0 position = ({}, {})\n", .{
        store.motion[0].x,
        store.motion[0].y,
    });
}
```

Output:

```text
entity 0 position = (1, 0)
```

This example separates motion from health. The update loop only touches motion data.

#### Data-Oriented APIs

A data-oriented function should usually accept the data it needs, not a giant object containing everything.

Less precise:

```zig
fn updateWorld(world: *World) void {
    // uses only world.motion
}
```

More precise:

```zig
fn updateMotion(motion: []Motion) void {
    for (motion) |*m| {
        m.x += m.vx;
        m.y += m.vy;
    }
}
```

The second version is easier to test. It also shows exactly what data the function uses.

This is a good habit in Zig: make data dependencies visible.

#### Data-Oriented Design and Structs

Structs still matter in data-oriented design.

You use structs to group fields that are commonly processed together.

```zig
const Motion = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
};
```

These fields belong together because motion update uses all of them.

But you may not group unrelated fields:

```zig
const Entity = struct {
    motion: Motion,
    health: i32,
    name: []const u8,
    debug_label: []const u8,
};
```

This may be fine for simple code. But if motion is updated millions of times per second and names are rarely used, splitting them can help.

The question is always: which data is used together?

#### Data-Oriented Design and Cache

Modern CPUs are fast, but memory access is relatively slow.

When a CPU reads memory, it usually brings a whole cache line into cache, not just one field.

If your loop walks through contiguous useful data, the CPU works well.

If your loop jumps around memory or pulls in many unused fields, it may waste time.

This is why arrays and slices matter:

```zig
for (items) |item| {
    // contiguous walk through memory
}
```

Zig makes this style clear because slices directly represent contiguous memory.

#### Do Not Overcomplicate Early

Data-oriented design can become too complicated if you apply it blindly.

This is usually fine:

```zig
const User = struct {
    id: u64,
    name: []const u8,
    email: []const u8,
};
```

You do not need a separate array for every field in a small application.

Use the simple design first when it is clear and fast enough.

Move toward data-oriented layouts when you have:

```text
large arrays
hot loops
repeated batch processing
memory bandwidth problems
cache misses
performance measurements
```

The goal is better structure, not cleverness.

#### The Main Idea

Data-oriented design means arranging code around the data and the way it is processed.

In Zig, this usually means:

```text
use structs to group data that belongs together
use arrays and slices for batches
keep allocation explicit
split hot data from cold data when needed
write functions that take the data they actually use
measure before making the layout complicated
```

A normal struct is often the right start:

```zig
const Entity = struct {
    x: f32,
    y: f32,
    vx: f32,
    vy: f32,
    health: i32,
};
```

When performance or clarity demands it, reshape the data:

```zig
const EntityStore = struct {
    motion: []Motion,
    health: []i32,
};
```

Good Zig code often starts with this question: what shape should the data have so the program can do its work clearly and efficiently?