# A Parallel File Scanner

### A Parallel File Scanner

This section combines the ideas from the previous sections into one program structure.

The program scans files in parallel.

Several worker threads receive file paths from a queue. Each worker opens a file, reads it, and updates shared statistics.

The design has four parts:

| Component | Purpose |
|---|---|
| job queue | stores file paths |
| worker threads | process files |
| shared state | stores totals |
| synchronization | coordinates access |

The shared state is small:

```zig
const State = struct {
    mutex: std.Thread.Mutex = .{},

    files_scanned: u64 = 0,
    bytes_read: u64 = 0,
};
```

The mutex protects the counters.

Workers update them through methods:

```zig
fn addFile(self: *State, bytes: u64) void {
    self.mutex.lock();
    defer self.mutex.unlock();

    self.files_scanned += 1;
    self.bytes_read += bytes;
}
```

The queue stores paths:

```zig
const JobQueue = struct {
    mutex: std.Thread.Mutex = .{},
    condition: std.Thread.Condition = .{},

    jobs: [128][]const u8 = undefined,

    head: usize = 0,
    tail: usize = 0,
    count: usize = 0,

    shutdown: bool = false,
};
```

The queue uses a ring buffer.

Workers remove paths from the queue:

```zig
fn pop(self: *JobQueue) ?[]const u8
```

The producer inserts paths:

```zig
fn push(self: *JobQueue, path: []const u8) void
```

Workers repeatedly:

1. take a path
2. open the file
3. read the file
4. update statistics

The worker loop looks like this:

```zig
fn worker(
    queue: *JobQueue,
    state: *State,
) void {
    while (true) {
        const path = queue.pop() orelse break;

        processFile(path, state);
    }
}
```

The worker exits when `pop` returns `null`.

The file processing function does not hold the mutex while reading the file:

```zig
fn processFile(
    path: []const u8,
    state: *State,
) void {
    // open file
    // read file
    // compute size

    state.addFile(size);
}
```

Only the counter update is synchronized.

The expensive I/O happens outside the lock.

This is important.

If the mutex stayed locked during file reading, only one worker could read files at a time. Parallelism would disappear.

The main thread usually acts as the producer.

It walks the filesystem and pushes paths into the queue:

```zig
while (iterator.next()) |entry| {
    queue.push(entry.path);
}
```

After all jobs are added:

```zig
queue.mutex.lock();
queue.shutdown = true;
queue.condition.broadcast();
queue.mutex.unlock();
```

This wakes all workers and tells them no more jobs will arrive.

Each worker eventually exits:

```zig
const path = queue.pop() orelse break;
```

The main thread joins all workers:

```zig
thread.join();
```

Finally the totals are printed:

```zig
std.debug.print(
    "files = {d}, bytes = {d}\n",
    .{
        state.files_scanned,
        state.bytes_read,
    },
);
```

This design scales reasonably well because:

- workers mostly operate independently
- shared state is small
- lock contention is low
- expensive work happens outside locks

The queue is the synchronization point.

The statistics object is the aggregation point.

Everything else is local worker state.

This is a common concurrent structure:

```text
producer -> queue -> workers -> shared results
```

The same model appears in:

- search engine crawlers
- compilers
- build systems
- backup tools
- static analyzers
- image processing pipelines

The details change, but the structure remains similar.

A good concurrent design usually has:

- few shared objects
- clear ownership
- short lock durations
- explicit shutdown rules
- bounded queues
- workers with independent local state

The threads themselves are not the important part.

The important part is how data moves through the system.

Exercise 18-35. Replace the fixed-size queue with a dynamically growing queue.

Exercise 18-36. Add a counter for failed file opens.

Exercise 18-37. Add recursive directory traversal.

Exercise 18-38. Add a second queue for completed results.

Exercise 18-39. Measure performance with one worker, two workers, and four workers.

