Skip to content

5.10 Composite Keys

Hash and compare multi-field keys correctly by combining all fields that participate in equality into the hash function.

Problem

You need to use multiple fields together as a single key in a hash table.

Examples include (user_id, timestamp), (x, y) coordinates, (source, destination), or (name, version). Treating each field separately leads to incorrect grouping or lookup. Treating them together requires a consistent definition of equality and hashing.

Solution

Define a composite key type with equality and hashing over all relevant fields.

Key:
    a
    b

Equality compares all fields:

equal(k1, k2):
    return k1.a == k2.a and k1.b == k2.b

Hashing combines the hashes of all fields:

hash(k):
    h1 = hash(k.a)
    h2 = hash(k.b)
    return combine(h1, h2)

Use this key type in maps or sets as a single unit.

Discussion

A composite key defines identity over multiple dimensions. The correctness requirement is strict: the fields used in equality must match exactly the fields used in hashing. If equality uses two fields, hashing must also use both. If hashing ignores a field used in equality, collisions increase but correctness is preserved. If hashing includes a field ignored by equality, correctness is broken.

The combination function should be order-sensitive. The pair (a, b) should usually produce a different hash than (b, a). A simple combination is:

combine(h1, h2):
    return h1 * constant + h2

More robust combinations mix bits to reduce patterns when inputs are structured.

Composite keys are preferable to encoding multiple fields into a single string or integer manually. Direct composition keeps types explicit, avoids parsing, and reduces accidental collisions.

Correctness

Let equality be defined as:

k1 == k2  iff  k1.a == k2.a and k1.b == k2.b

To preserve correctness, the hash function must satisfy:

if k1 == k2 then hash(k1) == hash(k2)

If both fields are equal, their individual hashes are equal, so the combined hash is equal.

If equality ignores a field but hashing includes it, two equal keys may produce different hashes. These keys would be placed in different buckets. Lookup would search only one bucket and could miss the stored entry. This violates correctness.

Complexity

Hashing a composite key takes time proportional to the cost of hashing each field plus the cost of combining them. For fixed-size fields such as integers, this is constant time. For variable-size fields such as strings, the cost depends on their length.

Map operations remain expected O(1) assuming a good combined hash and bounded load factor.

Example

Use a map to store values indexed by grid coordinates.

Key:
    x: integer
    y: integer

Insert and lookup:

put(map, (2, 3), "A")
put(map, (2, 4), "B")

get(map, (2, 3)) returns "A"
get(map, (2, 4)) returns "B"

If hashing used only x, both keys would collide. This is correct but inefficient. If hashing used (x, y) but equality used only x, lookup could fail.

Implementation Notes

Prefer language-provided tuple or struct hashing when available. These implementations are usually well-tested and optimized.

Ensure fields are immutable or treated as immutable after insertion. Changing a field changes the key’s hash and bucket position, making the entry difficult or impossible to find.

For nested composite keys, apply the same rule recursively. Each component must satisfy the equality-hash contract.

For large composite keys, consider storing a precomputed hash if the key is used frequently and is immutable.

Common Mistakes

Hashing only part of the key when equality uses all fields.

Hashing extra fields that equality ignores.

Using unordered combination when order matters.

Serializing fields into a string without clear separators, which can cause accidental collisions.

Mutating fields of a key after insertion into a hash table.