# Cuckoo Filter Lookup

# Cuckoo Filter Lookup

A cuckoo filter is a compact approximate membership structure based on cuckoo hashing. Instead of storing full keys, it stores short fingerprints in buckets. Lookup checks a small fixed number of candidate buckets, so queries are fast and predictable.

Like a Bloom filter, a cuckoo filter may return false positives. Unlike a standard Bloom filter, it supports deletion when fingerprints are stored with enough care.

## Problem

Given a cuckoo filter and a query key $k$, determine whether $k$ may belong to the represented set.

The answer is one of:

* definitely not present
* possibly present

## Model

For a key $k$, compute:

$$
f = \operatorname{fingerprint}(k)
$$

The first candidate bucket is:

$$
i_1 = h(k)
$$

The second candidate bucket is computed from the first bucket and the fingerprint:

$$
i_2 = i_1 \oplus h(f)
$$

A key is represented by storing its fingerprint $f$ in either bucket $i_1$ or bucket $i_2$.

## Algorithm

```text id="5w2q8n"
cuckoo_filter_lookup(F, k):
    f = fingerprint(k)
    i1 = h(k)
    i2 = i1 xor h(f)

    if bucket F[i1] contains f:
        return POSSIBLY_PRESENT

    if bucket F[i2] contains f:
        return POSSIBLY_PRESENT

    return DEFINITELY_NOT_PRESENT
```

## Example

Suppose:

* fingerprint of key is $f = 13$
* first bucket is $i_1 = 6$
* hash of fingerprint is $h(f) = 10$

Then:

$$
i_2 = 6 \oplus 10 = 12
$$

Lookup checks only buckets 6 and 12.

| step | bucket | action                    |
| ---- | -----: | ------------------------- |
| 1    |      6 | search for fingerprint 13 |
| 2    |     12 | search for fingerprint 13 |

If either bucket contains fingerprint 13, the answer is possibly present. If neither bucket contains it, the key is definitely absent.

## Correctness

Insertion stores the fingerprint of each inserted key in one of its two candidate buckets. Lookup recomputes exactly those two buckets from the queried key and fingerprint.

If the key was inserted and not deleted, its fingerprint must appear in one of the two candidate buckets. Therefore lookup will return possibly present.

If neither bucket contains the fingerprint, the key could not have been inserted under the cuckoo filter invariant, so it is definitely absent.

False positives occur when a different key stores the same fingerprint in one of the candidate buckets.

## Complexity

Let $b$ be the number of fingerprint slots per bucket.

| operation | time   |
| --------- | ------ |
| lookup    | $O(b)$ |

Since $b$ is a small fixed constant, lookup runs in constant time.

## Space

$$
O(nf)
$$

where $f$ is the number of bits per fingerprint. Practical implementations also store bucket layout metadata and usually keep the load factor below a safe threshold.

## When to Use

Cuckoo filters are appropriate when:

* approximate membership is acceptable
* deletion support is needed
* query latency should be predictable
* memory efficiency matters
* a small false positive rate is acceptable

They are common in caches, storage systems, databases, network systems, and deduplication pipelines.

## Implementation

```python id="gp9k6a"
class CuckooFilter:
    def __init__(self, m=16, bucket_size=4, fingerprint_bits=8):
        self.m = m
        self.bucket_size = bucket_size
        self.mask = (1 << fingerprint_bits) - 1
        self.buckets = [[] for _ in range(m)]

    def _fingerprint(self, key):
        f = hash(("fp", key)) & self.mask
        if f == 0:
            f = 1
        return f

    def _index1(self, key):
        return hash(("h1", key)) % self.m

    def _index2(self, i1, f):
        return (i1 ^ (hash(("h2", f)) % self.m)) % self.m

    def contains(self, key):
        f = self._fingerprint(key)
        i1 = self._index1(key)
        i2 = self._index2(i1, f)

        return f in self.buckets[i1] or f in self.buckets[i2]
```

```go id="9wrz4k"
type CuckooFilter struct {
	Buckets [][]uint64
	M       uint64
	Mask    uint64
}

func (f *CuckooFilter) hashString(s string, seed uint64) uint64 {
	var h uint64 = 1469598103934665603 + seed
	for i := 0; i < len(s); i++ {
		h ^= uint64(s[i])
		h *= 1099511628211
	}
	return h
}

func (f *CuckooFilter) fingerprint(key string) uint64 {
	fp := f.hashString(key, 1) & f.Mask
	if fp == 0 {
		fp = 1
	}
	return fp
}

func (f *CuckooFilter) index1(key string) uint64 {
	return f.hashString(key, 2) % f.M
}

func (f *CuckooFilter) index2(i1 uint64, fp uint64) uint64 {
	return (i1 ^ (fp * 0x9e3779b97f4a7c15)) % f.M
}

func (f *CuckooFilter) Contains(key string) bool {
	fp := f.fingerprint(key)
	i1 := f.index1(key)
	i2 := f.index2(i1, fp)

	for _, x := range f.Buckets[i1] {
		if x == fp {
			return true
		}
	}

	for _, x := range f.Buckets[i2] {
		if x == fp {
			return true
		}
	}

	return false
}
```

