Misra Gries is a deterministic streaming algorithm for finding frequent items with bounded memory. It keeps at most counters and returns a small candidate set that contains every item whose frequency is greater than .
The algorithm may return extra candidates. A second pass gives exact frequencies and removes false positives.
Problem
Given a stream
and an integer , find all items whose frequency is greater than
Algorithm
Maintain a map from item to counter.
misra_gries(stream, k):
counters = empty map
for x in stream:
if x in counters:
counters[x] = counters[x] + 1
else if size(counters) < k - 1:
counters[x] = 1
else:
for each item y in counters:
counters[y] = counters[y] - 1
if counters[y] == 0:
remove y from counters
return keys(counters)The returned keys are candidates. To get the exact heavy hitters, scan the stream again and count only those candidates.
Key Idea
When the table is full and a new item is not tracked, the algorithm subtracts one count from every tracked item. This cancels a group of distinct items: the new untracked item and the tracked items.
An item that appears more than times cannot be fully canceled by these groups, so it must remain as a candidate.
Example
Let
and
The algorithm keeps at most two counters. Since appears times and
must be returned as a candidate.
Correctness
Each full-table decrement can remove one occurrence from at most distinct items. Think of this as deleting one balanced group.
Suppose an item appears more than times. Since each deletion group contains at most one copy of , fewer than groups can delete all copies of . Therefore some copy of survives in the counter structure.
Thus every item with frequency greater than appears in the final candidate set.
Complexity
| measure | cost |
|---|---|
| counters | at most |
| memory | |
| update, simple version | worst case |
| total time | simple version |
| verification pass |
The common implementation is simple and effective when is small.
When to Use
Use Misra Gries when:
- the input is a stream
- exact counting for all items is too large
- deterministic guarantees are preferred
- false positives are acceptable before verification
For approximate frequency estimates on arbitrary keys, use Count Min Sketch.
Implementation
def misra_gries(stream, k):
counters = {}
for x in stream:
if x in counters:
counters[x] += 1
elif len(counters) < k - 1:
counters[x] = 1
else:
remove = []
for y in counters:
counters[y] -= 1
if counters[y] == 0:
remove.append(y)
for y in remove:
del counters[y]
return list(counters.keys())def misra_gries_exact(stream, k):
candidates = misra_gries(stream, k)
counts = {x: 0 for x in candidates}
for x in stream:
if x in counts:
counts[x] += 1
n = len(stream)
return [x for x in candidates if counts[x] > n // k]func MisraGries[T comparable](stream []T, k int) []T {
counters := map[T]int{}
for _, x := range stream {
if _, ok := counters[x]; ok {
counters[x]++
continue
}
if len(counters) < k-1 {
counters[x] = 1
continue
}
var remove []T
for y := range counters {
counters[y]--
if counters[y] == 0 {
remove = append(remove, y)
}
}
for _, y := range remove {
delete(counters, y)
}
}
out := make([]T, 0, len(counters))
for x := range counters {
out = append(out, x)
}
return out
}