Skip to content

Misra Gries

Find frequent items in a stream by maintaining a bounded set of counters.

Misra Gries is a deterministic streaming algorithm for finding frequent items with bounded memory. It keeps at most k1k - 1 counters and returns a small candidate set that contains every item whose frequency is greater than n/kn/k.

The algorithm may return extra candidates. A second pass gives exact frequencies and removes false positives.

Problem

Given a stream

x1,x2,,xn x_1, x_2, \dots, x_n

and an integer k2k \ge 2, find all items whose frequency is greater than

n/k n/k

Algorithm

Maintain a map from item to counter.

misra_gries(stream, k):
    counters = empty map

    for x in stream:
        if x in counters:
            counters[x] = counters[x] + 1
        else if size(counters) < k - 1:
            counters[x] = 1
        else:
            for each item y in counters:
                counters[y] = counters[y] - 1
                if counters[y] == 0:
                    remove y from counters

    return keys(counters)

The returned keys are candidates. To get the exact heavy hitters, scan the stream again and count only those candidates.

Key Idea

When the table is full and a new item is not tracked, the algorithm subtracts one count from every tracked item. This cancels a group of kk distinct items: the new untracked item and the k1k - 1 tracked items.

An item that appears more than n/kn/k times cannot be fully canceled by these groups, so it must remain as a candidate.

Example

Let

stream=[a,b,a,c,a,d,b,a] stream = [a, b, a, c, a, d, b, a]

and

k=3 k = 3

The algorithm keeps at most two counters. Since aa appears 44 times and

4>8/3 4 > 8/3

aa must be returned as a candidate.

Correctness

Each full-table decrement can remove one occurrence from at most kk distinct items. Think of this as deleting one balanced group.

Suppose an item xx appears more than n/kn/k times. Since each deletion group contains at most one copy of xx, fewer than count(x)\text{count}(x) groups can delete all copies of xx. Therefore some copy of xx survives in the counter structure.

Thus every item with frequency greater than n/kn/k appears in the final candidate set.

Complexity

measurecost
countersat most k1k - 1
memoryO(k)O(k)
update, simple versionO(k)O(k) worst case
total timeO(nk)O(nk) simple version
verification passO(n+k)O(n + k)

The common implementation is simple and effective when kk is small.

When to Use

Use Misra Gries when:

  • the input is a stream
  • exact counting for all items is too large
  • deterministic guarantees are preferred
  • false positives are acceptable before verification

For approximate frequency estimates on arbitrary keys, use Count Min Sketch.

Implementation

def misra_gries(stream, k):
    counters = {}

    for x in stream:
        if x in counters:
            counters[x] += 1
        elif len(counters) < k - 1:
            counters[x] = 1
        else:
            remove = []

            for y in counters:
                counters[y] -= 1
                if counters[y] == 0:
                    remove.append(y)

            for y in remove:
                del counters[y]

    return list(counters.keys())
def misra_gries_exact(stream, k):
    candidates = misra_gries(stream, k)

    counts = {x: 0 for x in candidates}
    for x in stream:
        if x in counts:
            counts[x] += 1

    n = len(stream)
    return [x for x in candidates if counts[x] > n // k]
func MisraGries[T comparable](stream []T, k int) []T {
	counters := map[T]int{}

	for _, x := range stream {
		if _, ok := counters[x]; ok {
			counters[x]++
			continue
		}

		if len(counters) < k-1 {
			counters[x] = 1
			continue
		}

		var remove []T
		for y := range counters {
			counters[y]--
			if counters[y] == 0 {
				remove = append(remove, y)
			}
		}

		for _, y := range remove {
			delete(counters, y)
		}
	}

	out := make([]T, 0, len(counters))
	for x := range counters {
		out = append(out, x)
	}

	return out
}