5.18 Attack Resistance

Problem

Hash tables usually provide expected constant time operations. That expectation depends on the assumption that keys are well distributed across buckets. When inputs are controlled by an adversary, this assumption may fail.

You need a hash table strategy that remains safe when attackers can choose keys deliberately.

Solution

Use a keyed hash function or a collision-resistant table strategy for untrusted inputs.

A keyed hash includes a secret random seed chosen by the process.

hash(key, secret_seed) -> integer

The table uses the seeded hash when choosing buckets.

index = hash(key, secret_seed) mod capacity

Since the seed is unknown to the attacker, it becomes difficult to manufacture many keys that collide in the running process.

Discussion

A collision attack targets the performance assumption of a hash table. If an attacker can produce many distinct keys that map to the same bucket or probe cluster, operations degrade from expected O(1) to O(n). In request parsers, JSON decoders, form handlers, routing tables, or public APIs, this can become a denial-of-service vector.

The simplest defense is randomized hashing. The hash function uses a per-process or per-table secret seed. The same key may hash differently across runs. This prevents attackers from preparing a universal collision set that works everywhere.

Another defense is structural hardening. Some hash table implementations convert long collision chains into balanced trees. This preserves better worst case behavior even under heavy collisions. The trade off is extra complexity and higher constant factors.

Cryptographic hashes are sometimes used for attack resistance, but they are often too expensive for ordinary in-memory tables. A keyed non-cryptographic hash is usually the practical choice for hash table indexing. Cryptographic hashes are appropriate when the hash itself is exposed as a security boundary or persistent identity.

Correctness

Attack resistance does not change the logical hash table contract.

Equal keys must still produce equal hashes under the same seed.

if key1 == key2:
    hash(key1, seed) == hash(key2, seed)

Different process runs may use different seeds, so the same key may produce a different hash across runs. This is acceptable for in-memory tables. It is not acceptable for persistent indexes that require stable hash values.

The table remains correct because insertion, lookup, and deletion all use the same seed during the lifetime of the table.

Complexity

With ordinary random-looking inputs, operations remain expected O(1).

With adversarial inputs and an unseeded weak hash, worst case behavior may become O(n) per operation.

With keyed hashing, the adversary cannot easily predict collisions, so expected constant time is restored under the assumption that the seed remains secret and the hash function mixes well.

With tree-backed collision buckets, worst case lookup within a bucket can be bounded by O(log n) instead of O(n).

Example

Suppose a web service parses request parameters into a map.

params = empty map

for (key, value) in request.parameters:
    put(params, key, value)

If the hash function is predictable, an attacker may send many different parameter names that collide into the same bucket. Each insertion scans a growing bucket, producing quadratic behavior across the whole request.

A keyed hash changes the bucket assignment for each process.

seed = random_secret()
hash("attacker-key", seed)

Without knowing the seed, the attacker cannot reliably predict which keys collide.

Implementation Notes

Use the language runtime’s hardened hash table for public-facing data when available.

Do not expose the hash seed.

Do not use process-randomized hashes for persistent files, database keys, or distributed partitioning unless all participants share the same seed deliberately.

Limit the number of keys accepted from untrusted input. Hash hardening helps, but resource limits are still necessary.

Monitor collision behavior or unusually long buckets in systems that handle hostile traffic.

Common Mistakes

Using a fast but predictable hash for public-facing maps.

Assuming average-case complexity protects against chosen inputs.

Persisting randomized hash values and expecting them to remain stable.

Changing the hash seed during table lifetime without rebuilding the table.

Treating cryptographic hashing and hash table hashing as the same problem.