Test approximate membership using a compact XOR-based filter with improved construction and cache locality.
A binary fuse filter is a refined XOR-based approximate membership structure. It improves over standard XOR filters in construction success rate, speed, and memory layout. It remains static after construction and supports fast constant-time lookup with low false positive rates.
You use it when you want a very compact, cache-friendly alternative to Bloom or XOR filters for static datasets.
Problem
Given a fixed set and a query key , determine whether may belong to .
The result is:
- definitely not present
- possibly present
Model
Each key produces:
- a fingerprint
- three positions in the table
The filter array is constructed so that for every key:
The difference from a standard XOR filter lies in how indices are partitioned and constructed, which improves reliability and memory packing.
Algorithm
binary_fuse_lookup(F, k):
f = fingerprint(k)
i1 = h1(k)
i2 = h2(k)
i3 = h3(k)
g = F[i1] xor F[i2] xor F[i3]
if g == f:
return POSSIBLY_PRESENT
return DEFINITELY_NOT_PRESENTExample
Suppose:
and:
Stored values:
| index | value |
|---|---|
| 1 | 2 |
| 4 | 5 |
| 6 | 0 |
Compute:
The result matches the fingerprint, so the key is possibly present.
Correctness
During construction, each key is assigned values in the table so that the XOR of its three positions equals its fingerprint. Lookup recomputes these positions and checks the same relation.
If the key was inserted, the equality holds. If the equality fails, the key cannot be in the set.
False positives occur when a non-member key produces the same fingerprint relation.
Complexity
| operation | time |
|---|---|
| lookup |
Lookup uses a fixed number of hash computations, table accesses, and XOR operations.
Space
Binary fuse filters are highly space efficient. They often use fewer bits per key than Bloom filters for the same false positive rate.
When to Use
Binary fuse filters are appropriate when:
- the dataset is static
- minimal memory usage is required
- fast lookups are critical
- construction can be done offline
- false positives are acceptable
They are widely used in search engines, storage systems, and large scale indexing pipelines.
Implementation
class BinaryFuseFilter:
def __init__(self, table, size, fingerprint_bits=8):
self.table = table
self.size = size
self.mask = (1 << fingerprint_bits) - 1
def _fingerprint(self, key):
fp = hash(("fp", key)) & self.mask
if fp == 0:
fp = 1
return fp
def _h1(self, key):
return hash(("h1", key)) % self.size
def _h2(self, key):
return hash(("h2", key)) % self.size
def _h3(self, key):
return hash(("h3", key)) % self.size
def contains(self, key):
fp = self._fingerprint(key)
x = self.table[self._h1(key)]
x ^= self.table[self._h2(key)]
x ^= self.table[self._h3(key)]
return x == fptype BinaryFuseFilter struct {
Table []uint64
Size uint64
Mask uint64
}
func (f *BinaryFuseFilter) hash(key string, seed uint64) uint64 {
var h uint64 = 1469598103934665603 + seed
for i := 0; i < len(key); i++ {
h ^= uint64(key[i])
h *= 1099511628211
}
return h
}
func (f *BinaryFuseFilter) fingerprint(key string) uint64 {
fp := f.hash(key, 1) & f.Mask
if fp == 0 {
fp = 1
}
return fp
}
func (f *BinaryFuseFilter) Contains(key string) bool {
fp := f.fingerprint(key)
i1 := f.hash(key, 2) % f.Size
i2 := f.hash(key, 3) % f.Size
i3 := f.hash(key, 4) % f.Size
x := f.Table[i1] ^ f.Table[i2] ^ f.Table[i3]
return x == fp
}