# Start with the Operations

### Performance Tradeoffs

A data structure is never just a container.

It is also a set of tradeoffs.

When you choose between `ArrayList`, `HashMap`, `StringHashMap`, linked lists, queues, bit sets, ring buffers, and trees, you are choosing how your program will use memory, CPU time, allocation, cache locality, and complexity.

The right choice depends on what the program does most often.

## Start with the Operations

Before choosing a collection, ask what operations matter.

Does the program mostly append?

Does it search by key?

Does it remove from the middle?

Does it need sorted order?

Does it need stable pointers?

Does it run with a fixed maximum size?

Does it allocate often?

Does it run in a tight loop?

A collection that is good for one operation may be poor for another.

For example, an `ArrayList` is excellent for appending and iteration, but removing from the front can be expensive. A hash map is excellent for key lookup, but it does not preserve order. A ring buffer is excellent for bounded queues, but it cannot grow unless you build resizing into it.

## Big-O Is Only the First Approximation

You will often see data structures described with Big-O notation.

```text
ArrayList append: usually O(1)
HashMap lookup: usually O(1)
Linked list insertion after node: O(1)
Binary search tree lookup: O(log n) if balanced
```

This is useful, but incomplete.

Big-O ignores constant factors, memory layout, allocation cost, branch prediction, cache behavior, and implementation details.

For real Zig programs, those details matter.

A linked list may have `O(1)` insertion, but each node may require a heap allocation and a pointer chase. An `ArrayList` may shift items during removal, but it stores memory densely, which is often much faster in practice.

## Cache Locality

Modern CPUs are fast when data is close together.

An `ArrayList` stores items contiguously:

```text
[10][20][30][40][50]
```

When the CPU reads `10`, it often loads nearby values too. Reading `20`, `30`, and `40` can be cheap.

A linked list stores nodes wherever allocation placed them:

```text
[10] -> somewhere else -> somewhere else -> somewhere else
```

Each pointer may jump to a different memory location.

This can be slow even if the algorithm looks good on paper.

For many workloads, a simple dense array beats a clever pointer-heavy structure.

## Allocation Cost

Heap allocation is not free.

This is cheap:

```zig
try list.append(value);
```

most of the time, because `ArrayList` reuses capacity.

This may be expensive:

```zig
const node = try allocator.create(Node);
```

if it happens for every inserted item.

A linked list often performs one allocation per node.

An `ArrayList` performs occasional larger allocations.

A ring buffer with caller-provided storage may perform no allocation after setup.

This matters in parsers, network services, games, embedded systems, and hot loops.

## Memory Overhead

Each collection has overhead.

An array stores only the elements:

```text
[value][value][value]
```

A linked list node stores the value plus one or more pointers:

```text
[value][next]
```

A doubly linked list stores even more:

```text
[prev][value][next]
```

A hash map stores keys, values, metadata, empty slots, and extra capacity.

A bit set stores one bit per possible value, which is extremely compact when the domain is small.

Memory overhead matters when there are many items.

## `ArrayList` Tradeoffs

`ArrayList` is often the best default collection.

It is good for:

```text
append-heavy workloads
fast iteration
index access
compact storage
sorting
passing data as slices
```

It is weaker for:

```text
frequent insertion at the front
frequent removal from the middle
stable pointers across growth
fast lookup by key
```

Appending is usually fast because capacity grows ahead of length.

But when the list grows, the internal buffer may move. That means pointers into `list.items` may become invalid after append.

Use `ArrayList` when you want dense, simple, fast storage.

## `HashMap` Tradeoffs

A hash map is good for lookup by key.

It is good for:

```text
id -> object
name -> value
path -> file info
word -> count
token -> token kind
```

It is weaker for:

```text
ordered iteration
small collections
stable iteration order
low memory overhead
range queries
```

Hash maps usually use more memory than arrays. They need empty capacity to stay fast.

They also need hashing and equality checks.

For a tiny collection with 5 or 10 items, a simple `ArrayList` scan may be faster and simpler than a hash map.

Use a hash map when lookup dominates and the collection is large enough to justify it.

## `StringHashMap` Tradeoffs

`StringHashMap` is a specialized hash map for string keys.

It compares string bytes, not pointer addresses.

It is good for:

```text
configuration keys
headers
keyword tables
symbol tables
file paths
word counts
```

It is weaker when:

```text
keys are temporary buffers
ownership is unclear
sorted output is required
memory use must be minimal
```

The most important issue is ownership.

If the map stores string literals, cleanup is simple.

If the map stores heap-allocated string copies, you must free those keys yourself.

`deinit()` frees the map table. It does not automatically free every heap allocation referenced by keys or values.

## Linked List Tradeoffs

Linked lists are useful, but they are often overused.

They are good for:

```text
stable node addresses
splicing nodes
intrusive structures
allocator free lists
scheduler queues
cases where you already have node pointers
```

They are weak for:

```text
fast iteration
index access
compact memory
simple ownership
CPU cache behavior
```

A linked list can remove a known node cheaply, but finding that node may require walking the list.

For beginner programs, use `ArrayList` first unless you have a specific reason to use a linked list.

## Queue Tradeoffs

A queue can be implemented several ways.

An `ArrayList` with `orderedRemove(0)` is simple, but each removal shifts items.

An `ArrayList` with a head index avoids shifting, but old slots remain until compaction.

A ring buffer reuses fixed storage, but has fixed capacity.

A linked list queue can grow, but usually allocates per node.

| Queue implementation | Strength | Weakness |
|---|---|---|
| `ArrayList` + `orderedRemove(0)` | Very simple | Shifts on every dequeue |
| `ArrayList` + head index | No shifting | Needs cleanup or compaction |
| Ring buffer | Fast and bounded | Fixed capacity |
| Linked list | Stable nodes | Allocation and poor locality |

For most bounded queues, a ring buffer is the cleanest choice.

For unknown growth, an `ArrayList` with a head index is often enough.

## Bit Set Tradeoffs

A bit set is excellent when values are small integer indexes.

It is good for:

```text
visited flags
permissions
feature flags
sets of enum values
graph algorithms
compiler analysis
```

It is weak for:

```text
string keys
large sparse numeric domains
ordered insertion history
associated values
```

A bit set tells you whether an index is present.

It does not store a value for that index.

If you need:

```text
key -> value
```

use a map.

If you only need:

```text
is key present?
```

and the key is a small integer, a bit set can be much more compact.

## Ring Buffer Tradeoffs

A ring buffer is a queue with fixed circular storage.

It is good for:

```text
bounded queues
audio streams
logging buffers
network buffers
embedded systems
producer-consumer handoff
```

It is weak when:

```text
capacity is unknown
old items must never be dropped
random insertion is needed
sorted order is needed
```

Ring buffers are predictable. That is their main strength.

No surprise growth. No per-item allocation. No shifting.

The cost is that you must decide what happens when the buffer is full.

Reject new items, block, grow, or overwrite old items. The right answer depends on the program.

## Tree Tradeoffs

Trees are useful when order and hierarchy matter.

They are good for:

```text
sorted data
range queries
hierarchical data
syntax trees
file trees
indexes
```

They are weak when:

```text
only direct lookup matters
tree balance is not maintained
ownership is complicated
iteration order does not matter
```

A plain binary search tree can become unbalanced.

If values are inserted in sorted order:

```text
1, 2, 3, 4, 5
```

the tree may become a chain:

```text
1
 \
  2
   \
    3
     \
      4
```

Then lookup becomes slow.

Production ordered maps usually use balanced trees or other index structures.

## Small Data Often Wants Simple Code

For small collections, simple code often wins.

Example:

```zig
for (items) |item| {
    if (item.id == target_id) return item;
}
```

This linear scan may be perfectly fine for 10 or 20 items.

A hash map may add complexity, allocation, and ownership issues without improving real performance.

Do not choose a complex structure just because it has better theoretical lookup.

Measure when it matters.

## Stable Pointers

Some collections move their elements.

`ArrayList` may move items when it grows.

Hash maps may move entries when they resize.

Ring buffers may overwrite slots.

Linked lists and heap-allocated trees usually keep node addresses stable until the node is destroyed.

This matters when other parts of your program store pointers to items.

If you need stable addresses, consider:

```text
heap-allocated nodes
object pools
indices instead of pointers
handles with generation counters
```

In Zig, invalid pointers are a serious bug. Design around pointer lifetime explicitly.

## Ownership and Cleanup Cost

Performance is not only speed.

A collection with unclear ownership costs you debugging time.

Ask:

```text
Who owns the elements?
Who frees the keys?
Who frees the values?
Can insertion fail halfway?
Can replacement leak memory?
Can removal leave dangling pointers?
```

Sometimes a slightly slower collection with clear ownership is better than a faster one with fragile lifetime rules.

Zig rewards explicit ownership.

## A Practical Decision Guide

Use `ArrayList` when you need a growable list, fast iteration, index access, or a slice.

Use `HashMap` when you need fast lookup by non-string keys.

Use `StringHashMap` when you need fast lookup by string keys.

Use a bit set when you need compact membership for small integer indexes.

Use a ring buffer when you need a bounded queue.

Use a linked list when you need stable nodes or intrusive structures.

Use a tree when you need hierarchy, sorted order, or range queries.

## Common Beginner Mistakes

The first mistake is choosing by name instead of workload.

The second mistake is using a hash map for tiny data where a slice scan would be simpler.

The third mistake is using a linked list for general storage, then getting poor iteration performance.

The fourth mistake is forgetting that fast lookup can cost memory.

The fifth mistake is ignoring allocation behavior.

The sixth mistake is keeping pointers into containers that may resize.

## A Useful Mental Model

A collection has a shape.

An `ArrayList` is a row.

```text
[ ][ ][ ][ ]
```

A hash map is a lookup table.

```text
key -> value
```

A linked list is a chain.

```text
[ ] -> [ ] -> [ ]
```

A ring buffer is a circle.

```text
0 -> 1 -> 2 -> 3 -> 0
```

A tree is a hierarchy.

```text
      root
     /    \
  child  child
```

The shape determines the cost.

## Summary

Performance tradeoffs are about matching the data structure to the workload.

There is no best collection for every case.

`ArrayList` is the best default for many beginner programs because it is simple, dense, and fast to iterate.

Hash maps are good for lookup.

Bit sets are good for compact membership.

Ring buffers are good for bounded queues.

Linked lists are useful when stable nodes and splicing matter.

Trees are useful for hierarchy and ordered data.

Choose the simplest structure that matches the operations your program actually performs.

