5.4 Load Factor and Resizing

Problem

As a hash table grows, collisions increase and operations slow down. If the table never expands, performance degrades toward linear time. If it expands too aggressively, memory is wasted and resizing costs dominate.

You need a policy that keeps operations fast while controlling memory usage.

Solution

Track the load factor and resize the table when it crosses a threshold.

load_factor = size / capacity

Choose a threshold α_max. When load_factor > α_max, allocate a larger table and reinsert all entries.

resize(table):
    new_capacity = grow(table.capacity)
    new_table = empty table with new_capacity

    for each bucket or slot in table:
        for each entry:
            insert(new_table, entry.key, entry.value)

    replace table with new_table

A common growth rule is doubling:

new_capacity = 2 * old_capacity

Discussion

The load factor controls the average number of entries per bucket (in chaining) or the expected probe length (in open addressing). It is the primary knob for balancing time and space.

For separate chaining, performance degrades gradually as the load factor increases. Each bucket becomes longer, but operations still examine only one bucket. Values such as α_max ≈ 1.0 or slightly higher are often acceptable.

For open addressing, performance is more sensitive. As the table fills, probe sequences grow nonlinearly. Values such as α_max ≈ 0.6–0.75 are typical. Beyond this range, clustering becomes severe and lookup cost rises quickly.

Resizing restores the distribution of entries. Because the bucket index depends on capacity, changing the capacity changes where each key belongs. Therefore, resizing must reinsert every entry using the new capacity. Copying memory without rehashing breaks correctness.

Resizing is expensive when it occurs, but it happens infrequently. With geometric growth (for example doubling), the total cost of resizing over many insertions is amortized. Each element is moved a small number of times, so the average cost per insertion remains constant.

Correctness

Correctness relies on maintaining the placement invariant.

Before resizing, each key k is stored in a position derived from:

hash(k) mod old_capacity

After resizing, the table uses:

hash(k) mod new_capacity

If entries are not reinserted according to the new capacity, lookup will compute a different index than the one used during insertion. The table would then search the wrong location and fail to find existing keys.

Reinsertion reestablishes the invariant for the new capacity. Once every entry is placed according to the new rule, lookup and deletion remain correct.

Complexity

Let n be the number of elements.

Insertion without resizing takes expected constant time under a bounded load factor.

A resize operation takes O(n) time because it processes every entry.

With doubling growth, resizing occurs at sizes:

1, 2, 4, 8, ..., n

Each element is moved at most O(log n) times, and the total cost across all resizes is O(n). Therefore, the amortized cost per insertion remains O(1).

For open addressing, expected lookup cost depends strongly on the load factor. As α approaches 1, expected probe length increases rapidly. Keeping α below a fixed threshold ensures expected constant time.

Implementation Notes

Choose a threshold appropriate to the collision strategy. Chaining tolerates higher thresholds than open addressing.

Use capacities that interact well with the hash function. Powers of two are common, but require good bit mixing. Prime capacities are sometimes used to reduce patterns in weaker hash functions.

When shrinking is required, apply a lower threshold. For example, shrink when load_factor < α_min. Avoid frequent oscillation by separating grow and shrink thresholds.

During resizing, allocate the new table first, then reinsert entries. Do not modify the old table in place during reinsertion.

For open addressing, resizing also removes tombstones. This restores short probe sequences.

Example

Suppose a table starts with capacity 4 and threshold α_max = 0.75.

Insert keys:

size = 1, capacity = 4, load_factor = 0.25
size = 2, capacity = 4, load_factor = 0.50
size = 3, capacity = 4, load_factor = 0.75

Inserting one more key gives:

size = 4, capacity = 4, load_factor = 1.00

This exceeds the threshold. The table resizes to capacity 8, and all entries are reinserted using the new capacity.

After resizing:

size = 4, capacity = 8, load_factor = 0.50

Operations return to expected constant time.

Common Mistakes

Not rehashing entries during resizing. This breaks lookup.

Choosing a threshold too high for open addressing. This causes long probe sequences.

Resizing too frequently by using small growth factors. This increases overhead.

Ignoring memory cost when choosing very low load factors.

Failing to clean up tombstones in open addressing. This silently degrades performance over time.