# Binary Fuse Filter Lookup

# Binary Fuse Filter Lookup

A binary fuse filter is a refined XOR-based approximate membership structure. It improves over standard XOR filters in construction success rate, speed, and memory layout. It remains static after construction and supports fast constant-time lookup with low false positive rates.

You use it when you want a very compact, cache-friendly alternative to Bloom or XOR filters for static datasets.

## Problem

Given a fixed set $S$ and a query key $k$, determine whether $k$ may belong to $S$.

The result is:

* definitely not present
* possibly present

## Model

Each key $k$ produces:

* a fingerprint

$$
f = \operatorname{fingerprint}(k)
$$

* three positions in the table

$$
h_1(k),\quad h_2(k),\quad h_3(k)
$$

The filter array $F$ is constructed so that for every key:

$$
F[h_1(k)] \oplus F[h_2(k)] \oplus F[h_3(k)] = f
$$

The difference from a standard XOR filter lies in how indices are partitioned and constructed, which improves reliability and memory packing.

## Algorithm

```text id="0t3mnl"
binary_fuse_lookup(F, k):
    f = fingerprint(k)

    i1 = h1(k)
    i2 = h2(k)
    i3 = h3(k)

    g = F[i1] xor F[i2] xor F[i3]

    if g == f:
        return POSSIBLY_PRESENT

    return DEFINITELY_NOT_PRESENT
```

## Example

Suppose:

$$
f = 7
$$

and:

$$
h_1(k)=1,\quad h_2(k)=4,\quad h_3(k)=6
$$

Stored values:

| index | value |
| ----: | ----: |
|     1 |     2 |
|     4 |     5 |
|     6 |     0 |

Compute:

$$
2 \oplus 5 \oplus 0 = 7
$$

The result matches the fingerprint, so the key is possibly present.

## Correctness

During construction, each key is assigned values in the table so that the XOR of its three positions equals its fingerprint. Lookup recomputes these positions and checks the same relation.

If the key was inserted, the equality holds. If the equality fails, the key cannot be in the set.

False positives occur when a non-member key produces the same fingerprint relation.

## Complexity

| operation | time   |
| --------- | ------ |
| lookup    | $O(1)$ |

Lookup uses a fixed number of hash computations, table accesses, and XOR operations.

## Space

$$
O(n)
$$

Binary fuse filters are highly space efficient. They often use fewer bits per key than Bloom filters for the same false positive rate.

## When to Use

Binary fuse filters are appropriate when:

* the dataset is static
* minimal memory usage is required
* fast lookups are critical
* construction can be done offline
* false positives are acceptable

They are widely used in search engines, storage systems, and large scale indexing pipelines.

## Implementation

```python id="f9d2ps"
class BinaryFuseFilter:
    def __init__(self, table, size, fingerprint_bits=8):
        self.table = table
        self.size = size
        self.mask = (1 << fingerprint_bits) - 1

    def _fingerprint(self, key):
        fp = hash(("fp", key)) & self.mask
        if fp == 0:
            fp = 1
        return fp

    def _h1(self, key):
        return hash(("h1", key)) % self.size

    def _h2(self, key):
        return hash(("h2", key)) % self.size

    def _h3(self, key):
        return hash(("h3", key)) % self.size

    def contains(self, key):
        fp = self._fingerprint(key)
        x = self.table[self._h1(key)]
        x ^= self.table[self._h2(key)]
        x ^= self.table[self._h3(key)]
        return x == fp
```

```go id="8tv3sl"
type BinaryFuseFilter struct {
	Table []uint64
	Size  uint64
	Mask  uint64
}

func (f *BinaryFuseFilter) hash(key string, seed uint64) uint64 {
	var h uint64 = 1469598103934665603 + seed
	for i := 0; i < len(key); i++ {
		h ^= uint64(key[i])
		h *= 1099511628211
	}
	return h
}

func (f *BinaryFuseFilter) fingerprint(key string) uint64 {
	fp := f.hash(key, 1) & f.Mask
	if fp == 0 {
		fp = 1
	}
	return fp
}

func (f *BinaryFuseFilter) Contains(key string) bool {
	fp := f.fingerprint(key)

	i1 := f.hash(key, 2) % f.Size
	i2 := f.hash(key, 3) % f.Size
	i3 := f.hash(key, 4) % f.Size

	x := f.Table[i1] ^ f.Table[i2] ^ f.Table[i3]
	return x == fp
}
```

