Test approximate membership by storing compact fingerprints in cuckoo hash table buckets.
A cuckoo filter is a compact approximate membership structure based on cuckoo hashing. Instead of storing full keys, it stores short fingerprints in buckets. Lookup checks a small fixed number of candidate buckets, so queries are fast and predictable.
Like a Bloom filter, a cuckoo filter may return false positives. Unlike a standard Bloom filter, it supports deletion when fingerprints are stored with enough care.
Problem
Given a cuckoo filter and a query key , determine whether may belong to the represented set.
The answer is one of:
- definitely not present
- possibly present
Model
For a key , compute:
The first candidate bucket is:
The second candidate bucket is computed from the first bucket and the fingerprint:
A key is represented by storing its fingerprint in either bucket or bucket .
Algorithm
cuckoo_filter_lookup(F, k):
f = fingerprint(k)
i1 = h(k)
i2 = i1 xor h(f)
if bucket F[i1] contains f:
return POSSIBLY_PRESENT
if bucket F[i2] contains f:
return POSSIBLY_PRESENT
return DEFINITELY_NOT_PRESENTExample
Suppose:
- fingerprint of key is
- first bucket is
- hash of fingerprint is
Then:
Lookup checks only buckets 6 and 12.
| step | bucket | action |
|---|---|---|
| 1 | 6 | search for fingerprint 13 |
| 2 | 12 | search for fingerprint 13 |
If either bucket contains fingerprint 13, the answer is possibly present. If neither bucket contains it, the key is definitely absent.
Correctness
Insertion stores the fingerprint of each inserted key in one of its two candidate buckets. Lookup recomputes exactly those two buckets from the queried key and fingerprint.
If the key was inserted and not deleted, its fingerprint must appear in one of the two candidate buckets. Therefore lookup will return possibly present.
If neither bucket contains the fingerprint, the key could not have been inserted under the cuckoo filter invariant, so it is definitely absent.
False positives occur when a different key stores the same fingerprint in one of the candidate buckets.
Complexity
Let be the number of fingerprint slots per bucket.
| operation | time |
|---|---|
| lookup |
Since is a small fixed constant, lookup runs in constant time.
Space
where is the number of bits per fingerprint. Practical implementations also store bucket layout metadata and usually keep the load factor below a safe threshold.
When to Use
Cuckoo filters are appropriate when:
- approximate membership is acceptable
- deletion support is needed
- query latency should be predictable
- memory efficiency matters
- a small false positive rate is acceptable
They are common in caches, storage systems, databases, network systems, and deduplication pipelines.
Implementation
class CuckooFilter:
def __init__(self, m=16, bucket_size=4, fingerprint_bits=8):
self.m = m
self.bucket_size = bucket_size
self.mask = (1 << fingerprint_bits) - 1
self.buckets = [[] for _ in range(m)]
def _fingerprint(self, key):
f = hash(("fp", key)) & self.mask
if f == 0:
f = 1
return f
def _index1(self, key):
return hash(("h1", key)) % self.m
def _index2(self, i1, f):
return (i1 ^ (hash(("h2", f)) % self.m)) % self.m
def contains(self, key):
f = self._fingerprint(key)
i1 = self._index1(key)
i2 = self._index2(i1, f)
return f in self.buckets[i1] or f in self.buckets[i2]type CuckooFilter struct {
Buckets [][]uint64
M uint64
Mask uint64
}
func (f *CuckooFilter) hashString(s string, seed uint64) uint64 {
var h uint64 = 1469598103934665603 + seed
for i := 0; i < len(s); i++ {
h ^= uint64(s[i])
h *= 1099511628211
}
return h
}
func (f *CuckooFilter) fingerprint(key string) uint64 {
fp := f.hashString(key, 1) & f.Mask
if fp == 0 {
fp = 1
}
return fp
}
func (f *CuckooFilter) index1(key string) uint64 {
return f.hashString(key, 2) % f.M
}
func (f *CuckooFilter) index2(i1 uint64, fp uint64) uint64 {
return (i1 ^ (fp * 0x9e3779b97f4a7c15)) % f.M
}
func (f *CuckooFilter) Contains(key string) bool {
fp := f.fingerprint(key)
i1 := f.index1(key)
i2 := f.index2(i1, fp)
for _, x := range f.Buckets[i1] {
if x == fp {
return true
}
}
for _, x := range f.Buckets[i2] {
if x == fp {
return true
}
}
return false
}