# Double Hashing Search

# Double Hashing Search

Double hashing reduces clustering by using a second hash function to determine the step size between probes. Each key generates its own probe sequence, which distributes entries more uniformly than linear or quadratic probing.

You search by following the same probe sequence defined by both hash functions.

## Problem

Given a hash table using double hashing and a key $k$, return the associated value if it exists.

## Model

* Table $T$ has size $m$
* Primary hash function $h_1(k)$
* Secondary hash function $h_2(k)$

Probe sequence:

$$
i_t = \big(h_1(k) + t \cdot h_2(k)\big) \bmod m
$$

with $t = 0, 1, 2, \dots$

Constraint:

$$
h_2(k) \neq 0 \quad \text{and} \quad \gcd(h_2(k), m) = 1
$$

to ensure full coverage of the table.

## Algorithm

```text id="k3v8qn"
double_hashing_search(T, k):
    i0 = h1(k)
    step = h2(k)
    for t from 0 to m - 1:
        i = (i0 + t * step) mod m
        if T[i] is empty:
            return NOT_FOUND
        if T[i].key == k:
            return T[i].value
    return NOT_FOUND
```

## Example

Let:

* $m = 7$
* $h_1(k) = k \bmod 7$
* $h_2(k) = 1 + (k \bmod 5)$

Insert:

$$
[10, 17, 24]
$$

For $k = 10$:

* $h_1(10) = 3$
* $h_2(10) = 1 + (10 \bmod 5) = 1$

For $k = 17$:

* $h_1(17) = 3$
* $h_2(17) = 1 + (17 \bmod 5) = 3$

Probe sequence for 17:

* index 3 → occupied
* index 6 → placed

Search for $k = 17$:

* check index 3 → no
* check index 6 → match

## Correctness

Each key generates a unique probe sequence determined by $h_1$ and $h_2$. The insertion and lookup processes follow identical sequences, so any inserted key will be found.

If an empty slot appears, the key was not inserted.

## Complexity

Let $\alpha = n/m$.

| case    | time   |
| ------- | ------ |
| average | $O(1)$ |
| worst   | $O(n)$ |

Double hashing minimizes clustering, so performance remains stable at higher load factors compared to other open addressing methods.

## Space

$$
O(m)
$$

## When to Use

Double hashing is appropriate when:

* clustering must be minimized
* high load factors are expected
* uniform distribution is important

It offers better performance than linear and quadratic probing in most practical scenarios.

## Implementation

```python id="q2m7pl"
class HashTable:
    def __init__(self, m=8):
        self.m = m
        self.table = [None] * m

    def _h1(self, key):
        return hash(key) % self.m

    def _h2(self, key):
        return 1 + (hash(key) % (self.m - 1))

    def get(self, key):
        i0 = self._h1(key)
        step = self._h2(key)

        for t in range(self.m):
            i = (i0 + t * step) % self.m
            if self.table[i] is None:
                return None
            k, v = self.table[i]
            if k == key:
                return v

        return None
```

```go id="x7v4pn"
type Entry[K comparable, V any] struct {
	Key   K
	Value V
	Used  bool
}

type HashTable[K comparable, V any] struct {
	Table []Entry[K, V]
	M     int
}

func (h *HashTable[K, V]) h1(key K) int {
	return int(uintptr(any(key))) % h.M
}

func (h *HashTable[K, V]) h2(key K) int {
	return 1 + int(uintptr(any(key)))%(h.M-1)
}

func (h *HashTable[K, V]) Get(key K) (V, bool) {
	i0 := h.h1(key)
	step := h.h2(key)

	for t := 0; t < h.M; t++ {
		i := (i0 + t*step) % h.M
		if !h.Table[i].Used {
			break
		}
		if h.Table[i].Key == key {
			return h.Table[i].Value, true
		}
	}

	var zero V
	return zero, false
}
```

