# 5.20 Testing Hash Logic

# 5.20 Testing Hash Logic

## Problem

You need confidence that a hash-based structure is correct, stable, and performs well under realistic inputs.

Hash table bugs are often subtle. A map may pass small examples but fail after resizing. A set may work until keys collide. A deterministic hash may appear stable until another language, platform, or encoding rule is introduced.

## Solution

Test both logical behavior and structural stress cases.

Basic behavior:

```text id="sa43kl"
insert key
lookup key
update key
delete key
lookup deleted key
```

Collision behavior:

```text id="dqn47x"
insert many keys with same bucket
lookup every inserted key
delete some keys
lookup remaining keys
```

Resize behavior:

```text id="zzplte"
insert enough keys to trigger growth
lookup every key after resizing
delete enough keys to trigger cleanup or shrink
lookup every remaining key
```

Equality and hashing contract:

```text id="qr8vq4"
if equal(a, b):
    assert hash(a) == hash(b)
```

## Discussion

Testing hash logic requires more than checking ordinary lookup examples. The implementation has several independent responsibilities: it must compute indexes consistently, preserve entries during resizing, handle collisions, compare keys correctly, and maintain size metadata.

A good test suite separates these concerns. Basic operation tests verify the public interface. Collision tests force the implementation through paths that random inputs may rarely exercise. Resize tests check that rehashing preserves all entries. Contract tests check that equality and hashing agree.

Randomized testing is useful, but it should be paired with adversarial cases. Random keys usually distribute well, so they can hide collision bugs. Constructed keys that deliberately share a hash value expose bucket scanning, probing, deletion, and tombstone behavior.

Differential testing is also effective. Compare the custom table against a trusted reference implementation.

```text id="yk2u82"
for operation in random_operations:
    apply operation to custom_map
    apply operation to reference_map
    assert observable_state(custom_map) == observable_state(reference_map)
```

The observable state includes lookup results, size, and iteration output when iteration order is specified.

## Correctness

Testing should target the main invariants.

Placement invariant:

```text id="k8jx1f"
each key is stored where lookup will search
```

Uniqueness invariant:

```text id="1hm8ad"
no two live entries have equal keys
```

Size invariant:

```text id="rr9rnc"
size equals the number of live entries
```

Probe invariant for open addressing:

```text id="6lcown"
lookup continues through occupied slots and tombstones
```

Resize invariant:

```text id="lf3zsg"
after rehashing, every live entry is reachable
```

Each test should be designed to break one of these invariants if the implementation is wrong.

## Complexity Testing

Performance tests should measure behavior under different load factors and collision patterns.

Useful scenarios:

```text id="ej1xjr"
- random keys
- sequential integer keys
- similar string keys
- long string keys
- composite keys
- deliberate collisions
- insert-heavy workload
- lookup-heavy workload
- delete-heavy workload
```

Measure at least:

```text id="mb6w6n"
- average lookup time
- tail lookup time
- resize cost
- memory usage
- collision count or probe length
```

Average time can hide severe outliers. Probe length distribution is often more informative than total runtime.

## Example

A collision test can use a custom key type whose hash is constant.

```text id="6ow5ec"
BadHashKey:
    value

hash(key):
    return 0

equal(a, b):
    return a.value == b.value
```

Test:

```text id="br5tau"
map = empty map

for i from 0 to 999:
    put(map, BadHashKey(i), i)

for i from 0 to 999:
    assert get(map, BadHashKey(i)) == i

for i from 0 to 499:
    remove(map, BadHashKey(i))

for i from 0 to 499:
    assert get(map, BadHashKey(i)) is not_found

for i from 500 to 999:
    assert get(map, BadHashKey(i)) == i
```

This test forces all entries into the same collision region. It is slow by design, but it reveals correctness errors in collision handling.

## Deterministic Hash Tests

For persistent hashes, add golden tests.

```text id="27v4kc"
assert hash("abc") == 0x...
assert hash(("user", 42)) == 0x...
assert hash(empty_bytes) == 0x...
```

Golden tests catch accidental changes to encoding, byte order, seeds, or algorithm version.

For cross-language systems, generate the same test vectors in every implementation.

## Common Mistakes

Testing only random keys.

Testing insertion without deletion.

Testing deletion without reinsertion.

Ignoring resize boundaries.

Failing to test equal keys with distinct object identities.

Failing to test composite keys.

Using benchmark results without checking correctness first.

Assuming deterministic hashes remain stable without golden tests.

