# Designing Concurrent Programs

### Designing Concurrent Programs

Threads are a mechanism, not a design.

A program should not begin with the question, “How many threads do I need?” It should begin with the data.

Ask first:

1. What data exists?
2. Who owns it?
3. Which thread may change it?
4. Which thread may read it?
5. How does one thread notify another?

Most concurrency bugs come from unclear ownership.

A simple rule is useful: one thread should own each mutable object.

Other threads may send messages to the owner, or they may access the object only through a small synchronized interface.

This is poor design:

```zig
var count: u32 = 0;
var failed: bool = false;
var path_buffer: [4096]u8 = undefined;
var mutex = std.Thread.Mutex{};
```

The data and the lock are separate. It is not clear which fields the mutex protects.

This is better:

```zig
const State = struct {
    mutex: std.Thread.Mutex = .{},
    count: u32 = 0,
    failed: bool = false,
    path_buffer: [4096]u8 = undefined,

    fn addOne(self: *State) void {
        self.mutex.lock();
        defer self.mutex.unlock();

        self.count += 1;
    }

    fn markFailed(self: *State) void {
        self.mutex.lock();
        defer self.mutex.unlock();

        self.failed = true;
    }
};
```

The lock and the protected data are together. The methods define the allowed operations.

Keep critical sections small.

Do not hold a mutex while doing slow work:

```zig
state.mutex.lock();
defer state.mutex.unlock();

try readLargeFile();
try writeNetworkResponse();
state.count += 1;
```

This blocks every other thread that needs the state.

Do the slow work outside the lock:

```zig
try readLargeFile();
try writeNetworkResponse();

state.mutex.lock();
defer state.mutex.unlock();

state.count += 1;
```

A lock should protect memory, not waiting time.

Prefer local data.

Local variables belong to the current thread. They need no lock.

```zig
fn worker() void {
    var local_count: u32 = 0;

    while (local_count < 1000) : (local_count += 1) {
        // no synchronization needed
    }
}
```

If many threads need to count work, let each thread count locally, then merge once.

```zig
const std = @import("std");

const State = struct {
    mutex: std.Thread.Mutex = .{},
    total: u64 = 0,

    fn add(self: *State, n: u64) void {
        self.mutex.lock();
        defer self.mutex.unlock();

        self.total += n;
    }
};

fn worker(state: *State) void {
    var local: u64 = 0;

    var i: u64 = 0;
    while (i < 1000000) : (i += 1) {
        local += 1;
    }

    state.add(local);
}

pub fn main() !void {
    var state = State{};

    const a = try std.Thread.spawn(.{}, worker, .{&state});
    const b = try std.Thread.spawn(.{}, worker, .{&state});

    a.join();
    b.join();

    std.debug.print("total = {d}\n", .{state.total});
}
```

This program locks only twice.

That is better than locking two million times.

Avoid sharing when message passing is enough.

A worker can receive a job, produce a result, and send the result back. It does not need to see the whole program state.

This shape is easier to reason about:

```text
main thread -> jobs -> workers -> results -> main thread
```

The worker owns the job while processing it.

The main thread owns the result after receiving it.

Ownership moves. Shared mutation is reduced.

Be careful with pointer lifetimes.

This is dangerous:

```zig
fn startThread() !std.Thread {
    var value: u32 = 42;
    return try std.Thread.spawn(.{}, worker, .{&value});
}
```

`value` is local to `startThread`. After the function returns, the pointer is no longer valid. The worker may read invalid memory.

The data passed to a thread must live until the thread finishes.

This is safe:

```zig
pub fn main() !void {
    var value: u32 = 42;

    const thread = try std.Thread.spawn(.{}, worker, .{&value});
    thread.join();
}
```

The thread is joined before `value` goes out of scope.

For long-running threads, store shared state in an object whose lifetime is clearly longer than the threads using it.

Do not use sleeps for correctness.

This is wrong:

```zig
std.Thread.sleep(1000000);
// assume worker is ready
```

A sleep guesses timing. Timing changes across machines, loads, and operating systems.

Use synchronization:

```zig
while (!state.ready) {
    state.condition.wait(&state.mutex);
}
```

A correct concurrent program should work even when the scheduler chooses the worst possible interleaving.

Good concurrent Zig programs have a few visible rules:

- mutable state has an owner
- shared state has a lock
- locks protect data, not arbitrary code
- waits use conditions, semaphores, or joins
- threads are joined or intentionally detached
- pointer lifetimes are explicit
- slow work happens outside critical sections

Concurrency is not made safe by using one primitive everywhere. It is made safe by keeping ownership simple.

Exercise 18-31. Rewrite a global shared-state program so the lock and data are fields of one struct.

Exercise 18-32. Move expensive work outside a locked section.

Exercise 18-33. Write a worker that counts locally and merges once at the end.

Exercise 18-34. Find a program that uses `sleep` for ordering. Replace the sleep with a condition variable.