# 5.5 Rehashing

# 5.5 Rehashing

## Problem

A hash table must occasionally change its capacity or internal layout. When this happens, existing entries cannot remain in their current positions because bucket indexes depend on the table size and probing rules.

You need a method to rebuild the table so that all operations remain correct and performance is restored.

## Solution

Reinsert all existing entries into a new table using the current hash function and the new capacity.

```text id="bq3k7n"
rehash(old_table, new_capacity):
    new_table = empty table with new_capacity

    for each entry in old_table:
        insert(new_table, entry.key, entry.value)

    return new_table
```

Replace the old table with the new one.

## Discussion

Rehashing is the operation that restores the core invariant of a hash table after a structural change. That invariant states that every key must reside in the position determined by the current hash function and current capacity.

There are two common triggers for rehashing. The first is growth, when the load factor exceeds a threshold. The second is cleanup, when deletions have introduced many tombstones in open addressing. In both cases, rehashing rebuilds the table so that entries are redistributed according to the current configuration.

Rehashing is distinct from resizing even though they often occur together. Resizing changes the capacity. Rehashing is the act of recomputing positions. If the capacity changes, rehashing is mandatory. If the capacity stays the same but the table contains many tombstones or has been subject to pathological clustering, rehashing can still be used to restore performance.

In separate chaining, rehashing redistributes entries across buckets. Long buckets may become shorter if the new capacity reduces collisions. In open addressing, rehashing removes tombstones and rebuilds probe sequences from scratch. This often leads to a significant improvement in lookup performance.

## Correctness

Correctness depends on recomputing the bucket index for every key using the new table parameters.

Let `k` be a key. In the old table, it was placed according to:

```text id="c93kq2"
hash(k) mod old_capacity
```

In the new table, lookup will compute:

```text id="w3h8y1"
hash(k) mod new_capacity
```

If `k` remains at its old position, lookup may inspect a different bucket or probe sequence and fail to find it. Therefore, every key must be reinserted so that its new position matches the new indexing rule.

For open addressing, reinsertion also reconstructs probe chains. Since insertion follows the probing rule, the resulting layout is consistent with lookup.

## Complexity

Rehashing takes `O(n)` time for `n` entries because each entry is processed once.

With geometric growth, rehashing occurs infrequently. Over a sequence of insertions, the total cost of all rehash operations is `O(n)`, so the amortized cost per insertion remains constant.

For cleanup rehashing in open addressing, the cost depends on how often it is triggered. If triggered only when tombstones exceed a fraction of capacity, the amortized overhead remains small.

## Implementation Notes

Always allocate a new table for rehashing. Do not attempt to rearrange entries in place. In-place rehashing is complex and error-prone because insertion and lookup rules interact with partially moved data.

Preserve the hash function and equality relation during rehashing. Changing either without rebuilding the table invalidates existing entries.

For open addressing, skip tombstones during rehashing. Only live entries are reinserted. This compacts the table and removes gaps in probe sequences.

When using randomized or seeded hash functions, ensure that the same seed is used consistently during rehashing unless a full rebuild with a new seed is intended.

## Example

Suppose a table with capacity `4` contains the following entries under separate chaining.

```text id="1h6n7k"
bucket 0: (k1, v1)
bucket 1: (k2, v2), (k3, v3)
bucket 2: empty
bucket 3: (k4, v4)
```

After resizing to capacity `8`, each key is reinserted:

```text id="4e6q1t"
new_index = hash(k) mod 8
```

The new distribution may look like:

```text id="a0x9rf"
bucket 0: (k1, v1)
bucket 1: (k2, v2)
bucket 3: (k3, v3)
bucket 7: (k4, v4)
```

Buckets are shorter and more evenly distributed.

## Common Mistakes

Copying buckets directly into a larger array without recomputing indexes.

Attempting partial rehashing that leaves some entries under the old indexing rule.

Forgetting to remove tombstones during rehashing in open addressing.

Triggering rehashing too often, which increases overhead.

Changing the hash function without rebuilding the table.