Benchmarking Code

Benchmarking measures how fast code runs.

When you benchmark a function, you are trying to answer questions like:

How long does this operation take?
How many allocations happen?
Which version is faster?
Did this optimization actually help?

Without measurement, performance discussions are mostly guesses.

Correctness First

Before benchmarking, make sure the code is correct.

This order matters:

1. make it correct
2. write tests
3. measure performance
4. optimize carefully

Do not optimize code that is still failing tests.

A fast bug is still a bug.

A Small Example

Suppose we want to benchmark this function:

fn sum(items: []const i32) i32 {
    var total: i32 = 0;

    for (items) |item| {
        total += item;
    }

    return total;
}

A benchmark repeatedly runs the function and measures the elapsed time.

Measuring Time

Use std.time.Timer.

Example:

const std = @import("std");

fn sum(items: []const i32) i32 {
    var total: i32 = 0;

    for (items) |item| {
        total += item;
    }

    return total;
}

pub fn main() !void {
    var values: [100000]i32 = undefined;

    for (&values, 0..) |*value, i| {
        value.* = @intCast(i);
    }

    var timer = try std.time.Timer.start();

    var result: i32 = 0;

    var iteration: usize = 0;
    while (iteration < 1000) : (iteration += 1) {
        result += sum(values[0..]);
    }

    const elapsed_ns = timer.read();

    std.debug.print("result = {}\n", .{result});
    std.debug.print("elapsed = {} ns\n", .{elapsed_ns});
}

This program:

creates an array

runs sum many times

measures elapsed time

prints the result

Why Repeat the Function Many Times

This would be a weak benchmark:

const elapsed_ns = timer.read();

after only one small call.

The operation may be too fast to measure accurately.

Instead, run the function many times:

while (iteration < 1000) : (iteration += 1) {
    result += sum(values[0..]);
}

Repeating the work reduces noise.

Preventing Dead Code Elimination

Compilers optimize aggressively.

If the result of a computation is never used, the compiler may remove the entire computation.

Bad benchmark:

while (iteration < 1000) : (iteration += 1) {
    _ = sum(values[0..]);
}

The compiler may realize the result is unused.

A safer pattern is:

result += sum(values[0..]);

Then print the result:

std.debug.print("{}\n", .{result});

Now the computation affects observable output.

Build in Release Mode

Debug builds are slower because safety checks are enabled.

Do not benchmark debug builds.

Bad:

zig build-exe main.zig

Better:

zig build-exe main.zig -O ReleaseFast

or:

zig build-exe main.zig -O ReleaseSafe

Then run the executable.

Performance numbers from debug builds are misleading.

Benchmark the Right Thing

A benchmark should isolate the operation you care about.

Suppose you want to measure sorting.

Bad benchmark:

while (iteration < 1000) : (iteration += 1) {
    var allocator = std.heap.page_allocator;

    const buffer = try allocator.alloc(i32, 100000);
    defer allocator.free(buffer);

    // fill buffer
    // sort buffer
}

This measures:

allocation

initialization

sorting

cleanup

all mixed together.

If you only care about sorting speed, separate the setup from the measurement.

Warm-Up Effects

The first run of a program may behave differently.

Caches may be empty.

Memory pages may not be loaded yet.

Some systems change CPU frequency dynamically.

So it is common to ignore the first few runs.

Simple beginner pattern:

var warmup: usize = 0;
while (warmup < 10) : (warmup += 1) {
    _ = sum(values[0..]);
}

Then start the timer afterward.

Benchmark Inputs Matter

Different inputs produce different performance.

Example:

fn contains(items: []const i32, target: i32) bool {
    for (items) |item| {
        if (item == target) return true;
    }

    return false;
}

Searching for the first item is fast:

target at index 0

Searching for a missing item is slower:

target not present

A good benchmark should describe the input clearly.

Compare Implementations Carefully

Suppose we have two functions:

fn squareA(x: i32) i32 {
    return x * x;
}

fn squareB(x: i32) i32 {
    return @pow(i32, x, 2);
}

Benchmark both under the same conditions:

same inputs

same build mode

same machine

same iteration count

Otherwise, the comparison is unreliable.

Measuring Allocations

Performance is not only about CPU time.

Allocations matter too.

Many unnecessary allocations slow programs and increase memory pressure.

Example:

const std = @import("std");

fn duplicate(
    allocator: std.mem.Allocator,
    text: []const u8,
) ![]u8 {
    return try allocator.dupe(u8, text);
}

Every call allocates memory.

Sometimes allocation is correct and necessary.

Sometimes it is avoidable.

Benchmarking helps you see whether allocation-heavy designs are actually costly.

Benchmarking Small Functions

Very small functions are hard to benchmark accurately.

Example:

fn increment(x: i32) i32 {
    return x + 1;
}

The function may be inlined.

The compiler may optimize heavily.

The timer overhead itself may become significant.

For tiny functions:

run many iterations

use release mode

be skeptical of tiny differences

Noise and Variability

Benchmark results are noisy.

Other programs on the machine may interfere.

CPU temperature and frequency may change.

Operating systems schedule many tasks simultaneously.

So this is normal:

run 1: 100 ms
run 2: 103 ms
run 3: 98 ms

Look for meaningful differences, not tiny fluctuations.

A 0.2% change is usually not important in beginner benchmarks.

A 30% change probably is.

Benchmarking Allocator Strategies

Zig makes allocator choice explicit, which makes allocator benchmarking easier.

Suppose you compare:

GeneralPurposeAllocator
ArenaAllocator
FixedBufferAllocator

You can measure:

allocation speed

memory reuse

cleanup cost

allocation count

This is one reason Zig is good for systems experimentation. Resource management is visible in the code.

Use Tests Together with Benchmarks

Keep correctness checks near performance tests.

Example:

try std.testing.expectEqual(expected, actual);

Then benchmark the correct implementation.

Optimizations sometimes introduce bugs.

Tests protect you while changing performance-sensitive code.

A Complete Example

Save this as main.zig:

const std = @import("std");

fn sum(items: []const i32) i64 {
    var total: i64 = 0;

    for (items) |item| {
        total += item;
    }

    return total;
}

pub fn main() !void {
    var values: [100000]i32 = undefined;

    for (&values, 0..) |*value, i| {
        value.* = @intCast(i);
    }

    var warmup: usize = 0;
    while (warmup < 10) : (warmup += 1) {
        _ = sum(values[0..]);
    }

    var timer = try std.time.Timer.start();

    var result: i64 = 0;

    var iteration: usize = 0;
    while (iteration < 1000) : (iteration += 1) {
        result += sum(values[0..]);
    }

    const elapsed_ns = timer.read();

    std.debug.print("result = {}\n", .{result});
    std.debug.print("elapsed = {} ns\n", .{elapsed_ns});
}

Build in release mode:

zig build-exe main.zig -O ReleaseFast

Run it:

./main

This benchmark is simple but reasonable:

the work is repeated many times

the result is used

warm-up happens before timing

the build is optimized

the measured section is isolated

The Main Idea

Benchmarking is measurement, not intuition.

Do not assume code is fast because it looks clever. Do not assume code is slow because it looks simple.

Write tests first. Then measure carefully. Then optimize the specific part that actually matters.