Skip to content

XOR Filter Lookup

Test approximate membership with a static fingerprint filter built from three hash locations and XOR constraints.

An XOR filter is a compact static approximate membership structure. It stores short fingerprints in an array. A query key is tested by hashing it to three array positions, XORing the three stored fingerprints, and comparing the result with the key fingerprint.

Like Bloom filters, XOR filters may return false positives. They do not return false negatives after a successful construction. They are read only after construction, so they are best suited to static sets.

Problem

Given a fixed set SS and a query key kk, determine whether kk may belong to SS.

The answer is one of:

  • definitely not present
  • possibly present

Model

For each key kk, compute:

f=fingerprint(k) f = \operatorname{fingerprint}(k)

and three array positions:

h1(k),h2(k),h3(k) h_1(k),\quad h_2(k),\quad h_3(k)

The filter array FF is built so that every stored key satisfies:

F[h1(k)]F[h2(k)]F[h3(k)]=f F[h_1(k)] \oplus F[h_2(k)] \oplus F[h_3(k)] = f

where \oplus denotes bitwise XOR.

Algorithm

xor_filter_lookup(F, k):
    f = fingerprint(k)

    i1 = h1(k)
    i2 = h2(k)
    i3 = h3(k)

    g = F[i1] xor F[i2] xor F[i3]

    if g == f:
        return POSSIBLY_PRESENT

    return DEFINITELY_NOT_PRESENT

Example

Suppose a key has:

f=9 f = 9

and hash positions:

h1(k)=2,h2(k)=5,h3(k)=8 h_1(k)=2,\quad h_2(k)=5,\quad h_3(k)=8

The lookup reads:

positionstored fingerprint
24
511
86

Then:

4116=9 4 \oplus 11 \oplus 6 = 9

Since the result equals the query fingerprint, the answer is possibly present.

If the XOR result differed from 9, the key would be definitely absent.

Correctness

Construction assigns array fingerprints so that every inserted key satisfies the XOR equation for its three hash locations. Therefore any inserted key will reproduce its fingerprint during lookup.

If the computed XOR value differs from the query fingerprint, the key cannot be in the set.

False positives remain possible because a key outside the set may accidentally produce the same fingerprint value.

Complexity

operationtime
lookupO(1)O(1)

Lookup uses three table reads, several hash computations, and one fingerprint comparison.

Space

O(n) O(n)

The constant factor depends mainly on the fingerprint size and the construction slack. XOR filters are often more compact than Bloom filters for similar false positive rates.

When to Use

XOR filters are appropriate when:

  • the key set is static
  • compact approximate membership is needed
  • lookup speed matters
  • deletions are not required
  • construction can be done offline

They are useful in storage engines, static indexes, crawlers, read only dictionaries, and network filters.

Implementation

class XorFilter:
    def __init__(self, table, size, fingerprint_bits=8):
        self.table = table
        self.size = size
        self.mask = (1 << fingerprint_bits) - 1

    def _fingerprint(self, key):
        fp = hash(("fp", key)) & self.mask
        if fp == 0:
            fp = 1
        return fp

    def _h1(self, key):
        return hash(("h1", key)) % self.size

    def _h2(self, key):
        return hash(("h2", key)) % self.size

    def _h3(self, key):
        return hash(("h3", key)) % self.size

    def contains(self, key):
        fp = self._fingerprint(key)
        x = self.table[self._h1(key)]
        x ^= self.table[self._h2(key)]
        x ^= self.table[self._h3(key)]
        return x == fp
type XorFilter struct {
	Table []uint64
	Size  uint64
	Mask  uint64
}

func (f *XorFilter) hash(key string, seed uint64) uint64 {
	var h uint64 = 1469598103934665603 + seed
	for i := 0; i < len(key); i++ {
		h ^= uint64(key[i])
		h *= 1099511628211
	}
	return h
}

func (f *XorFilter) fingerprint(key string) uint64 {
	fp := f.hash(key, 1) & f.Mask
	if fp == 0 {
		fp = 1
	}
	return fp
}

func (f *XorFilter) Contains(key string) bool {
	fp := f.fingerprint(key)

	i1 := f.hash(key, 2) % f.Size
	i2 := f.hash(key, 3) % f.Size
	i3 := f.hash(key, 4) % f.Size

	x := f.Table[i1] ^ f.Table[i2] ^ f.Table[i3]
	return x == fp
}