Skip to content

Cuckoo Filter Lookup

Test approximate membership by storing compact fingerprints in cuckoo hash table buckets.

A cuckoo filter is a compact approximate membership structure based on cuckoo hashing. Instead of storing full keys, it stores short fingerprints in buckets. Lookup checks a small fixed number of candidate buckets, so queries are fast and predictable.

Like a Bloom filter, a cuckoo filter may return false positives. Unlike a standard Bloom filter, it supports deletion when fingerprints are stored with enough care.

Problem

Given a cuckoo filter and a query key kk, determine whether kk may belong to the represented set.

The answer is one of:

  • definitely not present
  • possibly present

Model

For a key kk, compute:

f=fingerprint(k) f = \operatorname{fingerprint}(k)

The first candidate bucket is:

i1=h(k) i_1 = h(k)

The second candidate bucket is computed from the first bucket and the fingerprint:

i2=i1h(f) i_2 = i_1 \oplus h(f)

A key is represented by storing its fingerprint ff in either bucket i1i_1 or bucket i2i_2.

Algorithm

cuckoo_filter_lookup(F, k):
    f = fingerprint(k)
    i1 = h(k)
    i2 = i1 xor h(f)

    if bucket F[i1] contains f:
        return POSSIBLY_PRESENT

    if bucket F[i2] contains f:
        return POSSIBLY_PRESENT

    return DEFINITELY_NOT_PRESENT

Example

Suppose:

  • fingerprint of key is f=13f = 13
  • first bucket is i1=6i_1 = 6
  • hash of fingerprint is h(f)=10h(f) = 10

Then:

i2=610=12 i_2 = 6 \oplus 10 = 12

Lookup checks only buckets 6 and 12.

stepbucketaction
16search for fingerprint 13
212search for fingerprint 13

If either bucket contains fingerprint 13, the answer is possibly present. If neither bucket contains it, the key is definitely absent.

Correctness

Insertion stores the fingerprint of each inserted key in one of its two candidate buckets. Lookup recomputes exactly those two buckets from the queried key and fingerprint.

If the key was inserted and not deleted, its fingerprint must appear in one of the two candidate buckets. Therefore lookup will return possibly present.

If neither bucket contains the fingerprint, the key could not have been inserted under the cuckoo filter invariant, so it is definitely absent.

False positives occur when a different key stores the same fingerprint in one of the candidate buckets.

Complexity

Let bb be the number of fingerprint slots per bucket.

operationtime
lookupO(b)O(b)

Since bb is a small fixed constant, lookup runs in constant time.

Space

O(nf) O(nf)

where ff is the number of bits per fingerprint. Practical implementations also store bucket layout metadata and usually keep the load factor below a safe threshold.

When to Use

Cuckoo filters are appropriate when:

  • approximate membership is acceptable
  • deletion support is needed
  • query latency should be predictable
  • memory efficiency matters
  • a small false positive rate is acceptable

They are common in caches, storage systems, databases, network systems, and deduplication pipelines.

Implementation

class CuckooFilter:
    def __init__(self, m=16, bucket_size=4, fingerprint_bits=8):
        self.m = m
        self.bucket_size = bucket_size
        self.mask = (1 << fingerprint_bits) - 1
        self.buckets = [[] for _ in range(m)]

    def _fingerprint(self, key):
        f = hash(("fp", key)) & self.mask
        if f == 0:
            f = 1
        return f

    def _index1(self, key):
        return hash(("h1", key)) % self.m

    def _index2(self, i1, f):
        return (i1 ^ (hash(("h2", f)) % self.m)) % self.m

    def contains(self, key):
        f = self._fingerprint(key)
        i1 = self._index1(key)
        i2 = self._index2(i1, f)

        return f in self.buckets[i1] or f in self.buckets[i2]
type CuckooFilter struct {
	Buckets [][]uint64
	M       uint64
	Mask    uint64
}

func (f *CuckooFilter) hashString(s string, seed uint64) uint64 {
	var h uint64 = 1469598103934665603 + seed
	for i := 0; i < len(s); i++ {
		h ^= uint64(s[i])
		h *= 1099511628211
	}
	return h
}

func (f *CuckooFilter) fingerprint(key string) uint64 {
	fp := f.hashString(key, 1) & f.Mask
	if fp == 0 {
		fp = 1
	}
	return fp
}

func (f *CuckooFilter) index1(key string) uint64 {
	return f.hashString(key, 2) % f.M
}

func (f *CuckooFilter) index2(i1 uint64, fp uint64) uint64 {
	return (i1 ^ (fp * 0x9e3779b97f4a7c15)) % f.M
}

func (f *CuckooFilter) Contains(key string) bool {
	fp := f.fingerprint(key)
	i1 := f.index1(key)
	i2 := f.index2(i1, fp)

	for _, x := range f.Buckets[i1] {
		if x == fp {
			return true
		}
	}

	for _, x := range f.Buckets[i2] {
		if x == fp {
			return true
		}
	}

	return false
}