Test approximate membership with a static fingerprint filter built from three hash locations and XOR constraints.
An XOR filter is a compact static approximate membership structure. It stores short fingerprints in an array. A query key is tested by hashing it to three array positions, XORing the three stored fingerprints, and comparing the result with the key fingerprint.
Like Bloom filters, XOR filters may return false positives. They do not return false negatives after a successful construction. They are read only after construction, so they are best suited to static sets.
Problem
Given a fixed set and a query key , determine whether may belong to .
The answer is one of:
- definitely not present
- possibly present
Model
For each key , compute:
and three array positions:
The filter array is built so that every stored key satisfies:
where denotes bitwise XOR.
Algorithm
xor_filter_lookup(F, k):
f = fingerprint(k)
i1 = h1(k)
i2 = h2(k)
i3 = h3(k)
g = F[i1] xor F[i2] xor F[i3]
if g == f:
return POSSIBLY_PRESENT
return DEFINITELY_NOT_PRESENTExample
Suppose a key has:
and hash positions:
The lookup reads:
| position | stored fingerprint |
|---|---|
| 2 | 4 |
| 5 | 11 |
| 8 | 6 |
Then:
Since the result equals the query fingerprint, the answer is possibly present.
If the XOR result differed from 9, the key would be definitely absent.
Correctness
Construction assigns array fingerprints so that every inserted key satisfies the XOR equation for its three hash locations. Therefore any inserted key will reproduce its fingerprint during lookup.
If the computed XOR value differs from the query fingerprint, the key cannot be in the set.
False positives remain possible because a key outside the set may accidentally produce the same fingerprint value.
Complexity
| operation | time |
|---|---|
| lookup |
Lookup uses three table reads, several hash computations, and one fingerprint comparison.
Space
The constant factor depends mainly on the fingerprint size and the construction slack. XOR filters are often more compact than Bloom filters for similar false positive rates.
When to Use
XOR filters are appropriate when:
- the key set is static
- compact approximate membership is needed
- lookup speed matters
- deletions are not required
- construction can be done offline
They are useful in storage engines, static indexes, crawlers, read only dictionaries, and network filters.
Implementation
class XorFilter:
def __init__(self, table, size, fingerprint_bits=8):
self.table = table
self.size = size
self.mask = (1 << fingerprint_bits) - 1
def _fingerprint(self, key):
fp = hash(("fp", key)) & self.mask
if fp == 0:
fp = 1
return fp
def _h1(self, key):
return hash(("h1", key)) % self.size
def _h2(self, key):
return hash(("h2", key)) % self.size
def _h3(self, key):
return hash(("h3", key)) % self.size
def contains(self, key):
fp = self._fingerprint(key)
x = self.table[self._h1(key)]
x ^= self.table[self._h2(key)]
x ^= self.table[self._h3(key)]
return x == fptype XorFilter struct {
Table []uint64
Size uint64
Mask uint64
}
func (f *XorFilter) hash(key string, seed uint64) uint64 {
var h uint64 = 1469598103934665603 + seed
for i := 0; i < len(key); i++ {
h ^= uint64(key[i])
h *= 1099511628211
}
return h
}
func (f *XorFilter) fingerprint(key string) uint64 {
fp := f.hash(key, 1) & f.Mask
if fp == 0 {
fp = 1
}
return fp
}
func (f *XorFilter) Contains(key string) bool {
fp := f.fingerprint(key)
i1 := f.hash(key, 2) % f.Size
i2 := f.hash(key, 3) % f.Size
i3 := f.hash(key, 4) % f.Size
x := f.Table[i1] ^ f.Table[i2] ^ f.Table[i3]
return x == fp
}