Benchmarking measures how fast code runs.
When you benchmark a function, you are trying to answer questions like:
How long does this operation take?
How many allocations happen?
Which version is faster?
Did this optimization actually help?Without measurement, performance discussions are mostly guesses.
Correctness First
Before benchmarking, make sure the code is correct.
This order matters:
1. make it correct
2. write tests
3. measure performance
4. optimize carefullyDo not optimize code that is still failing tests.
A fast bug is still a bug.
A Small Example
Suppose we want to benchmark this function:
fn sum(items: []const i32) i32 {
var total: i32 = 0;
for (items) |item| {
total += item;
}
return total;
}A benchmark repeatedly runs the function and measures the elapsed time.
Measuring Time
Use std.time.Timer.
Example:
const std = @import("std");
fn sum(items: []const i32) i32 {
var total: i32 = 0;
for (items) |item| {
total += item;
}
return total;
}
pub fn main() !void {
var values: [100000]i32 = undefined;
for (&values, 0..) |*value, i| {
value.* = @intCast(i);
}
var timer = try std.time.Timer.start();
var result: i32 = 0;
var iteration: usize = 0;
while (iteration < 1000) : (iteration += 1) {
result += sum(values[0..]);
}
const elapsed_ns = timer.read();
std.debug.print("result = {}\n", .{result});
std.debug.print("elapsed = {} ns\n", .{elapsed_ns});
}This program:
creates an array
runs sum many times
measures elapsed time
prints the result
Why Repeat the Function Many Times
This would be a weak benchmark:
const elapsed_ns = timer.read();after only one small call.
The operation may be too fast to measure accurately.
Instead, run the function many times:
while (iteration < 1000) : (iteration += 1) {
result += sum(values[0..]);
}Repeating the work reduces noise.
Preventing Dead Code Elimination
Compilers optimize aggressively.
If the result of a computation is never used, the compiler may remove the entire computation.
Bad benchmark:
while (iteration < 1000) : (iteration += 1) {
_ = sum(values[0..]);
}The compiler may realize the result is unused.
A safer pattern is:
result += sum(values[0..]);Then print the result:
std.debug.print("{}\n", .{result});Now the computation affects observable output.
Build in Release Mode
Debug builds are slower because safety checks are enabled.
Do not benchmark debug builds.
Bad:
zig build-exe main.zigBetter:
zig build-exe main.zig -O ReleaseFastor:
zig build-exe main.zig -O ReleaseSafeThen run the executable.
Performance numbers from debug builds are misleading.
Benchmark the Right Thing
A benchmark should isolate the operation you care about.
Suppose you want to measure sorting.
Bad benchmark:
while (iteration < 1000) : (iteration += 1) {
var allocator = std.heap.page_allocator;
const buffer = try allocator.alloc(i32, 100000);
defer allocator.free(buffer);
// fill buffer
// sort buffer
}This measures:
allocation
initialization
sorting
cleanup
all mixed together.
If you only care about sorting speed, separate the setup from the measurement.
Warm-Up Effects
The first run of a program may behave differently.
Caches may be empty.
Memory pages may not be loaded yet.
Some systems change CPU frequency dynamically.
So it is common to ignore the first few runs.
Simple beginner pattern:
var warmup: usize = 0;
while (warmup < 10) : (warmup += 1) {
_ = sum(values[0..]);
}Then start the timer afterward.
Benchmark Inputs Matter
Different inputs produce different performance.
Example:
fn contains(items: []const i32, target: i32) bool {
for (items) |item| {
if (item == target) return true;
}
return false;
}Searching for the first item is fast:
target at index 0Searching for a missing item is slower:
target not presentA good benchmark should describe the input clearly.
Compare Implementations Carefully
Suppose we have two functions:
fn squareA(x: i32) i32 {
return x * x;
}
fn squareB(x: i32) i32 {
return @pow(i32, x, 2);
}Benchmark both under the same conditions:
same inputs
same build mode
same machine
same iteration count
Otherwise, the comparison is unreliable.
Measuring Allocations
Performance is not only about CPU time.
Allocations matter too.
Many unnecessary allocations slow programs and increase memory pressure.
Example:
const std = @import("std");
fn duplicate(
allocator: std.mem.Allocator,
text: []const u8,
) ![]u8 {
return try allocator.dupe(u8, text);
}Every call allocates memory.
Sometimes allocation is correct and necessary.
Sometimes it is avoidable.
Benchmarking helps you see whether allocation-heavy designs are actually costly.
Benchmarking Small Functions
Very small functions are hard to benchmark accurately.
Example:
fn increment(x: i32) i32 {
return x + 1;
}The function may be inlined.
The compiler may optimize heavily.
The timer overhead itself may become significant.
For tiny functions:
run many iterations
use release mode
be skeptical of tiny differences
Noise and Variability
Benchmark results are noisy.
Other programs on the machine may interfere.
CPU temperature and frequency may change.
Operating systems schedule many tasks simultaneously.
So this is normal:
run 1: 100 ms
run 2: 103 ms
run 3: 98 msLook for meaningful differences, not tiny fluctuations.
A 0.2% change is usually not important in beginner benchmarks.
A 30% change probably is.
Benchmarking Allocator Strategies
Zig makes allocator choice explicit, which makes allocator benchmarking easier.
Suppose you compare:
GeneralPurposeAllocator
ArenaAllocator
FixedBufferAllocatorYou can measure:
allocation speed
memory reuse
cleanup cost
allocation count
This is one reason Zig is good for systems experimentation. Resource management is visible in the code.
Use Tests Together with Benchmarks
Keep correctness checks near performance tests.
Example:
try std.testing.expectEqual(expected, actual);Then benchmark the correct implementation.
Optimizations sometimes introduce bugs.
Tests protect you while changing performance-sensitive code.
A Complete Example
Save this as main.zig:
const std = @import("std");
fn sum(items: []const i32) i64 {
var total: i64 = 0;
for (items) |item| {
total += item;
}
return total;
}
pub fn main() !void {
var values: [100000]i32 = undefined;
for (&values, 0..) |*value, i| {
value.* = @intCast(i);
}
var warmup: usize = 0;
while (warmup < 10) : (warmup += 1) {
_ = sum(values[0..]);
}
var timer = try std.time.Timer.start();
var result: i64 = 0;
var iteration: usize = 0;
while (iteration < 1000) : (iteration += 1) {
result += sum(values[0..]);
}
const elapsed_ns = timer.read();
std.debug.print("result = {}\n", .{result});
std.debug.print("elapsed = {} ns\n", .{elapsed_ns});
}Build in release mode:
zig build-exe main.zig -O ReleaseFastRun it:
./mainThis benchmark is simple but reasonable:
the work is repeated many times
the result is used
warm-up happens before timing
the build is optimized
the measured section is isolated
The Main Idea
Benchmarking is measurement, not intuition.
Do not assume code is fast because it looks clever. Do not assume code is slow because it looks simple.
Write tests first. Then measure carefully. Then optimize the specific part that actually matters.