# XOR Filter Lookup

# XOR Filter Lookup

An XOR filter is a compact static approximate membership structure. It stores short fingerprints in an array. A query key is tested by hashing it to three array positions, XORing the three stored fingerprints, and comparing the result with the key fingerprint.

Like Bloom filters, XOR filters may return false positives. They do not return false negatives after a successful construction. They are read only after construction, so they are best suited to static sets.

## Problem

Given a fixed set $S$ and a query key $k$, determine whether $k$ may belong to $S$.

The answer is one of:

* definitely not present
* possibly present

## Model

For each key $k$, compute:

$$
f = \operatorname{fingerprint}(k)
$$

and three array positions:

$$
h_1(k),\quad h_2(k),\quad h_3(k)
$$

The filter array $F$ is built so that every stored key satisfies:

$$
F[h_1(k)] \oplus F[h_2(k)] \oplus F[h_3(k)] = f
$$

where $\oplus$ denotes bitwise XOR.

## Algorithm

```text id="j26afq"
xor_filter_lookup(F, k):
    f = fingerprint(k)

    i1 = h1(k)
    i2 = h2(k)
    i3 = h3(k)

    g = F[i1] xor F[i2] xor F[i3]

    if g == f:
        return POSSIBLY_PRESENT

    return DEFINITELY_NOT_PRESENT
```

## Example

Suppose a key has:

$$
f = 9
$$

and hash positions:

$$
h_1(k)=2,\quad h_2(k)=5,\quad h_3(k)=8
$$

The lookup reads:

| position | stored fingerprint |
| -------: | -----------------: |
|        2 |                  4 |
|        5 |                 11 |
|        8 |                  6 |

Then:

$$
4 \oplus 11 \oplus 6 = 9
$$

Since the result equals the query fingerprint, the answer is possibly present.

If the XOR result differed from 9, the key would be definitely absent.

## Correctness

Construction assigns array fingerprints so that every inserted key satisfies the XOR equation for its three hash locations. Therefore any inserted key will reproduce its fingerprint during lookup.

If the computed XOR value differs from the query fingerprint, the key cannot be in the set.

False positives remain possible because a key outside the set may accidentally produce the same fingerprint value.

## Complexity

| operation | time   |
| --------- | ------ |
| lookup    | $O(1)$ |

Lookup uses three table reads, several hash computations, and one fingerprint comparison.

## Space

$$
O(n)
$$

The constant factor depends mainly on the fingerprint size and the construction slack. XOR filters are often more compact than Bloom filters for similar false positive rates.

## When to Use

XOR filters are appropriate when:

* the key set is static
* compact approximate membership is needed
* lookup speed matters
* deletions are not required
* construction can be done offline

They are useful in storage engines, static indexes, crawlers, read only dictionaries, and network filters.

## Implementation

```python id="sad2ia"
class XorFilter:
    def __init__(self, table, size, fingerprint_bits=8):
        self.table = table
        self.size = size
        self.mask = (1 << fingerprint_bits) - 1

    def _fingerprint(self, key):
        fp = hash(("fp", key)) & self.mask
        if fp == 0:
            fp = 1
        return fp

    def _h1(self, key):
        return hash(("h1", key)) % self.size

    def _h2(self, key):
        return hash(("h2", key)) % self.size

    def _h3(self, key):
        return hash(("h3", key)) % self.size

    def contains(self, key):
        fp = self._fingerprint(key)
        x = self.table[self._h1(key)]
        x ^= self.table[self._h2(key)]
        x ^= self.table[self._h3(key)]
        return x == fp
```

```go id="gepp3x"
type XorFilter struct {
	Table []uint64
	Size  uint64
	Mask  uint64
}

func (f *XorFilter) hash(key string, seed uint64) uint64 {
	var h uint64 = 1469598103934665603 + seed
	for i := 0; i < len(key); i++ {
		h ^= uint64(key[i])
		h *= 1099511628211
	}
	return h
}

func (f *XorFilter) fingerprint(key string) uint64 {
	fp := f.hash(key, 1) & f.Mask
	if fp == 0 {
		fp = 1
	}
	return fp
}

func (f *XorFilter) Contains(key string) bool {
	fp := f.fingerprint(key)

	i1 := f.hash(key, 2) % f.Size
	i2 := f.hash(key, 3) % f.Size
	i3 := f.hash(key, 4) % f.Size

	x := f.Table[i1] ^ f.Table[i2] ^ f.Table[i3]
	return x == fp
}
```

