Skip to content

Binary Fuse Filter Lookup

Test approximate membership using a compact XOR-based filter with improved construction and cache locality.

A binary fuse filter is a refined XOR-based approximate membership structure. It improves over standard XOR filters in construction success rate, speed, and memory layout. It remains static after construction and supports fast constant-time lookup with low false positive rates.

You use it when you want a very compact, cache-friendly alternative to Bloom or XOR filters for static datasets.

Problem

Given a fixed set SS and a query key kk, determine whether kk may belong to SS.

The result is:

  • definitely not present
  • possibly present

Model

Each key kk produces:

  • a fingerprint
f=fingerprint(k) f = \operatorname{fingerprint}(k)
  • three positions in the table
h1(k),h2(k),h3(k) h_1(k),\quad h_2(k),\quad h_3(k)

The filter array FF is constructed so that for every key:

F[h1(k)]F[h2(k)]F[h3(k)]=f F[h_1(k)] \oplus F[h_2(k)] \oplus F[h_3(k)] = f

The difference from a standard XOR filter lies in how indices are partitioned and constructed, which improves reliability and memory packing.

Algorithm

binary_fuse_lookup(F, k):
    f = fingerprint(k)

    i1 = h1(k)
    i2 = h2(k)
    i3 = h3(k)

    g = F[i1] xor F[i2] xor F[i3]

    if g == f:
        return POSSIBLY_PRESENT

    return DEFINITELY_NOT_PRESENT

Example

Suppose:

f=7 f = 7

and:

h1(k)=1,h2(k)=4,h3(k)=6 h_1(k)=1,\quad h_2(k)=4,\quad h_3(k)=6

Stored values:

indexvalue
12
45
60

Compute:

250=7 2 \oplus 5 \oplus 0 = 7

The result matches the fingerprint, so the key is possibly present.

Correctness

During construction, each key is assigned values in the table so that the XOR of its three positions equals its fingerprint. Lookup recomputes these positions and checks the same relation.

If the key was inserted, the equality holds. If the equality fails, the key cannot be in the set.

False positives occur when a non-member key produces the same fingerprint relation.

Complexity

operationtime
lookupO(1)O(1)

Lookup uses a fixed number of hash computations, table accesses, and XOR operations.

Space

O(n) O(n)

Binary fuse filters are highly space efficient. They often use fewer bits per key than Bloom filters for the same false positive rate.

When to Use

Binary fuse filters are appropriate when:

  • the dataset is static
  • minimal memory usage is required
  • fast lookups are critical
  • construction can be done offline
  • false positives are acceptable

They are widely used in search engines, storage systems, and large scale indexing pipelines.

Implementation

class BinaryFuseFilter:
    def __init__(self, table, size, fingerprint_bits=8):
        self.table = table
        self.size = size
        self.mask = (1 << fingerprint_bits) - 1

    def _fingerprint(self, key):
        fp = hash(("fp", key)) & self.mask
        if fp == 0:
            fp = 1
        return fp

    def _h1(self, key):
        return hash(("h1", key)) % self.size

    def _h2(self, key):
        return hash(("h2", key)) % self.size

    def _h3(self, key):
        return hash(("h3", key)) % self.size

    def contains(self, key):
        fp = self._fingerprint(key)
        x = self.table[self._h1(key)]
        x ^= self.table[self._h2(key)]
        x ^= self.table[self._h3(key)]
        return x == fp
type BinaryFuseFilter struct {
	Table []uint64
	Size  uint64
	Mask  uint64
}

func (f *BinaryFuseFilter) hash(key string, seed uint64) uint64 {
	var h uint64 = 1469598103934665603 + seed
	for i := 0; i < len(key); i++ {
		h ^= uint64(key[i])
		h *= 1099511628211
	}
	return h
}

func (f *BinaryFuseFilter) fingerprint(key string) uint64 {
	fp := f.hash(key, 1) & f.Mask
	if fp == 0 {
		fp = 1
	}
	return fp
}

func (f *BinaryFuseFilter) Contains(key string) bool {
	fp := f.fingerprint(key)

	i1 := f.hash(key, 2) % f.Size
	i2 := f.hash(key, 3) % f.Size
	i3 := f.hash(key, 4) % f.Size

	x := f.Table[i1] ^ f.Table[i2] ^ f.Table[i3]
	return x == fp
}