# What Actually Makes Programs Slow?

### Understanding Zig Performance

Performance is one of the main reasons people choose Zig.

Zig is designed for software where speed, memory usage, startup time, and predictable behavior matter. But “performance” is a broad word. It does not only mean “runs fast.”

A program can be considered performant if it:

- finishes work quickly
- uses little memory
- avoids unnecessary allocations
- starts instantly
- scales well under load
- keeps CPU caches busy efficiently
- avoids unpredictable pauses
- produces small binaries
- uses hardware effectively

Zig gives you direct control over these things.

This chapter explains how Zig achieves high performance, what affects program speed, and how to think about optimization correctly.

## What Actually Makes Programs Slow?

Beginners often think performance depends mostly on the programming language.

In reality, performance usually depends on:

- memory access patterns
- allocations
- cache misses
- unnecessary copying
- bad algorithms
- synchronization overhead
- branch prediction failures
- system calls
- I/O bottlenecks

A fast language cannot save a slow algorithm.

For example, this is still slow even in Zig:

```zig
for (0..1_000_000) |_| {
    for (0..1_000_000) |_| {
        // work
    }
}
```

That loop performs one trillion iterations.

The first step in optimization is understanding where time is actually spent.

## Zig’s Performance Philosophy

Zig follows a simple philosophy:

> The programmer should control cost directly.

Many languages hide costs behind abstractions.

Examples:

- automatic heap allocation
- garbage collection
- hidden copies
- exceptions
- runtime reflection
- virtual dispatch
- implicit conversions

Zig tries to avoid hidden work.

If memory is allocated, you usually see the allocator.

If data is copied, you usually wrote the copy.

If a function can fail, you see the error handling.

This makes performance easier to reason about.

## Zig Is Close to the Machine

Zig compiles directly to native machine code.

A Zig program can become:

- x86-64 machine code
- ARM machine code
- WebAssembly
- other native targets

This means there is no virtual machine between your code and the CPU.

Languages like Java or C# often execute inside a runtime environment. Zig programs usually run directly on the operating system.

That reduces overhead.

## Zero-Cost Abstractions

One important idea in Zig is the zero-cost abstraction.

An abstraction is “zero-cost” if it makes the code easier to write without adding runtime overhead.

For example:

```zig
fn add(a: i32, b: i32) i32 {
    return a + b;
}
```

The compiler can inline this function directly into the caller.

Instead of generating a real function call, the compiler may replace it with:

```text
result = a + b
```

No extra overhead remains.

Good Zig abstractions disappear during compilation.

## Release Modes Matter

Zig has several build modes.

The most common are:

| Mode | Purpose |
|---|---|
| Debug | Safety checks and debugging |
| ReleaseSafe | Optimized with many safety checks |
| ReleaseFast | Maximum optimization |
| ReleaseSmall | Smaller binaries |

Example:

```bash
zig build-exe main.zig -O ReleaseFast
```

Debug mode is intentionally slower.

It includes:

- bounds checks
- overflow checks
- safety validation
- debugging information

ReleaseFast removes many runtime safety checks for speed.

This distinction is important because beginners sometimes benchmark Debug builds accidentally.

## CPU Speed Is Not the Main Problem

Modern CPUs are extremely fast.

A CPU can execute billions of operations per second.

The real bottleneck is often memory access.

Consider these two cases:

### Fast Access

```zig
var numbers: [1000]u32 = undefined;

for (&numbers, 0..) |*n, i| {
    n.* = @intCast(i);
}
```

The data is contiguous in memory.

The CPU cache works efficiently.

### Slow Access

```zig
const Node = struct {
    value: u32,
    next: ?*Node,
};
```

Linked lists may scatter nodes across memory.

The CPU constantly jumps to different memory locations.

This creates cache misses.

A cache miss can cost far more than a simple arithmetic operation.

In high-performance systems, memory layout is often more important than raw computation speed.

## Stack vs Heap Allocation

Stack allocation is usually very fast.

Example:

```zig
var buffer: [1024]u8 = undefined;
```

This memory exists directly inside the stack frame.

Heap allocation is slower:

```zig
const memory = try allocator.alloc(u8, 1024);
```

Heap allocation may involve:

- searching free memory
- synchronization
- fragmentation management
- operating system interaction

Frequent heap allocations can become expensive.

Zig encourages careful allocation patterns because allocators are explicit.

## Allocations Are Expensive

Suppose you build strings repeatedly like this:

```zig
while (true) {
    const s = try allocator.alloc(u8, 100);
    defer allocator.free(s);
}
```

This repeatedly allocates and frees memory.

Allocation overhead can dominate runtime.

A better approach is often:

- reuse buffers
- use arenas
- allocate once
- process data in batches

Zig makes these strategies easier because allocation is visible.

## Cache Locality

Cache locality means keeping related data close together in memory.

This matters enormously.

Example:

```zig
const Particle = struct {
    x: f32,
    y: f32,
    z: f32,
};
```

An array of particles:

```zig
var particles: [10000]Particle = undefined;
```

stores data sequentially.

The CPU can load nearby particles efficiently.

This is cache-friendly.

Poor locality forces the CPU to fetch memory repeatedly from slower layers.

## Branch Prediction

Modern CPUs try to predict branches.

Example:

```zig
if (value > 0) {
    // branch
}
```

If the CPU predicts correctly, execution stays fast.

If prediction fails repeatedly, the CPU pipeline stalls.

Random branching patterns can hurt performance.

Predictable code is often faster.

## SIMD and Vectorization

Modern CPUs can process multiple values simultaneously.

Example:

- adding 8 integers at once
- multiplying multiple floats in parallel

This is called SIMD:

- Single Instruction
- Multiple Data

Zig supports vector types directly.

Example:

```zig
const Vec4 = @Vector(4, f32);
```

The compiler may generate vector instructions automatically.

This can dramatically improve numeric workloads.

## Function Calls Are Usually Cheap

Beginners often worry too much about function calls.

Modern compilers optimize aggressively.

Small functions are frequently inlined automatically.

This:

```zig
fn square(x: i32) i32 {
    return x * x;
}
```

may produce no function call at all in optimized builds.

You should usually focus on:

- allocations
- memory access
- algorithm complexity

before worrying about tiny function call overhead.

## Data Copies Matter

Copying large data structures repeatedly is expensive.

Example:

```zig
fn process(data: [100000]u8) void {
    _ = data;
}
```

This copies the entire array.

A slice avoids the copy:

```zig
fn process(data: []u8) void {
    _ = data;
}
```

Slices are lightweight views into memory.

Understanding ownership and copying is critical for performance.

## System Calls Are Slow

Operations involving the operating system are expensive.

Examples:

- reading files
- network operations
- process creation
- console output

This is why buffering matters.

Bad:

```zig
for (0..1000000) |i| {
    std.debug.print("{}\n", .{i});
}
```

This may perform huge numbers of writes.

Better:

- accumulate output
- write larger chunks
- reduce syscall frequency

## Zig Gives Predictable Performance

One major advantage of Zig is predictability.

There is usually:

- no garbage collector pause
- no hidden allocations
- no JIT warmup
- no runtime interpreter

Performance behavior is easier to understand.

This matters in:

- games
- embedded systems
- databases
- operating systems
- real-time systems
- networking infrastructure

## Optimization Has Tradeoffs

Optimization is not free.

Highly optimized code can become:

- harder to read
- harder to debug
- less flexible
- more platform-specific

Good engineers optimize carefully.

The normal process is:

1. write correct code
2. measure performance
3. identify bottlenecks
4. optimize the real bottlenecks
5. measure again

Never guess blindly.

## Premature Optimization

A famous engineering rule says:

> Premature optimization is the root of all evil.

This means:

Do not make code complicated before you know performance is actually a problem.

Many “optimizations” make programs worse:

- more bugs
- less readable code
- tiny or nonexistent speed gains

Good performance work is guided by measurement.

## What Zig Is Good At

Zig performs especially well in:

- systems programming
- networking
- game engines
- compilers
- command-line tools
- embedded software
- parsers
- data processing
- native libraries

Zig is designed for programs where direct control matters.

## Mental Model for Zig Performance

When writing Zig, think about:

- where memory lives
- who owns memory
- how often allocation happens
- whether data is contiguous
- whether copies occur
- whether work can happen at compile time
- whether branches are predictable
- whether the CPU cache is being used effectively

Performance engineering is largely about reducing unnecessary work.

Zig gives you the visibility and control needed to do that precisely.

