Test set membership with a compact probabilistic bit array that allows false positives but no false negatives.
A Bloom filter is a compact probabilistic structure for membership testing. It can answer that a key is definitely absent or possibly present. It may report false positives, but it does not report false negatives when implemented correctly.
You use it when memory is limited and exact storage of all keys is too expensive.
Problem
Given a set and a query key , determine whether may belong to .
The result is one of:
- definitely not present
- possibly present
Model
A Bloom filter consists of:
- bit array of length
- hash functions
Each hash function maps a key to a bit position:
To insert a key, set all corresponding bits to 1.
To query a key, check all corresponding bits.
Algorithm
bloom_filter_lookup(B, k):
for each hash function h_j:
i = h_j(k)
if B[i] == 0:
return DEFINITELY_NOT_PRESENT
return POSSIBLY_PRESENTExample
Let the bit array have length , and suppose a key uses three hash positions:
If the bit array contains:
| index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|
| bit | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
then all bits 2, 5, and 8 are set, so the answer is possibly present.
If any one of these bits were 0, the key would be definitely absent.
Correctness
When a key is inserted, every bit selected by its hash functions is set to 1. Therefore, a later lookup for the same key will find all those bits set.
If lookup finds a 0 bit, the key could not have been inserted, because insertion would have set that bit. Therefore the key is definitely absent.
If all bits are 1, the key may have been inserted, but the bits may also have been set by other keys. Therefore the result is only possibly present.
False Positive Rate
For inserted keys, bits, and hash functions, the approximate false positive probability is:
The optimal number of hash functions is approximately:
Complexity
| operation | time |
|---|---|
| lookup |
When is a small fixed constant, lookup is constant time.
Space
A Bloom filter is often much smaller than storing the full key set.
When to Use
Bloom filters are appropriate when:
- memory is limited
- false positives are acceptable
- false negatives are unacceptable
- membership tests happen before expensive lookups
- deletions are not required
They are common in databases, caches, web crawlers, storage engines, and distributed systems.
Implementation
class BloomFilter:
def __init__(self, m=1024, r=3):
self.m = m
self.r = r
self.bits = 0
def _hashes(self, key):
for seed in range(self.r):
yield hash((seed, key)) % self.m
def add(self, key):
for i in self._hashes(key):
self.bits |= 1 << i
def contains(self, key):
for i in self._hashes(key):
if ((self.bits >> i) & 1) == 0:
return False
return Truetype BloomFilter struct {
Bits []uint64
M uint64
R uint64
}
func (b *BloomFilter) hash(key string, seed uint64) uint64 {
var h uint64 = 1469598103934665603 + seed
for i := 0; i < len(key); i++ {
h ^= uint64(key[i])
h *= 1099511628211
}
return h % b.M
}
func (b *BloomFilter) Contains(key string) bool {
for seed := uint64(0); seed < b.R; seed++ {
i := b.hash(key, seed)
word := i / 64
bit := i % 64
if (b.Bits[word]&(uint64(1)<<bit)) == 0 {
return false
}
}
return true
}