# Misra Gries

# Misra Gries

Misra Gries is a deterministic streaming algorithm for finding frequent items with bounded memory. It keeps at most $k - 1$ counters and returns a small candidate set that contains every item whose frequency is greater than $n/k$.

The algorithm may return extra candidates. A second pass gives exact frequencies and removes false positives.

## Problem

Given a stream

$$
x_1, x_2, \dots, x_n
$$

and an integer $k \ge 2$, find all items whose frequency is greater than

$$
n/k
$$

## Algorithm

Maintain a map from item to counter.

```id="mg1"
misra_gries(stream, k):
    counters = empty map

    for x in stream:
        if x in counters:
            counters[x] = counters[x] + 1
        else if size(counters) < k - 1:
            counters[x] = 1
        else:
            for each item y in counters:
                counters[y] = counters[y] - 1
                if counters[y] == 0:
                    remove y from counters

    return keys(counters)
```

The returned keys are candidates. To get the exact heavy hitters, scan the stream again and count only those candidates.

## Key Idea

When the table is full and a new item is not tracked, the algorithm subtracts one count from every tracked item. This cancels a group of $k$ distinct items: the new untracked item and the $k - 1$ tracked items.

An item that appears more than $n/k$ times cannot be fully canceled by these groups, so it must remain as a candidate.

## Example

Let

$$
stream = [a, b, a, c, a, d, b, a]
$$

and

$$
k = 3
$$

The algorithm keeps at most two counters. Since $a$ appears $4$ times and

$$
4 > 8/3
$$

$a$ must be returned as a candidate.

## Correctness

Each full-table decrement can remove one occurrence from at most $k$ distinct items. Think of this as deleting one balanced group.

Suppose an item $x$ appears more than $n/k$ times. Since each deletion group contains at most one copy of $x$, fewer than $\text{count}(x)$ groups can delete all copies of $x$. Therefore some copy of $x$ survives in the counter structure.

Thus every item with frequency greater than $n/k$ appears in the final candidate set.

## Complexity

| measure                |                   cost |
| ---------------------- | ---------------------: |
| counters               |        at most $k - 1$ |
| memory                 |                 $O(k)$ |
| update, simple version |      $O(k)$ worst case |
| total time             | $O(nk)$ simple version |
| verification pass      |             $O(n + k)$ |

The common implementation is simple and effective when $k$ is small.

## When to Use

Use Misra Gries when:

* the input is a stream
* exact counting for all items is too large
* deterministic guarantees are preferred
* false positives are acceptable before verification

For approximate frequency estimates on arbitrary keys, use Count Min Sketch.

## Implementation

```id="mg2"
def misra_gries(stream, k):
    counters = {}

    for x in stream:
        if x in counters:
            counters[x] += 1
        elif len(counters) < k - 1:
            counters[x] = 1
        else:
            remove = []

            for y in counters:
                counters[y] -= 1
                if counters[y] == 0:
                    remove.append(y)

            for y in remove:
                del counters[y]

    return list(counters.keys())
```

```id="mg3"
def misra_gries_exact(stream, k):
    candidates = misra_gries(stream, k)

    counts = {x: 0 for x in candidates}
    for x in stream:
        if x in counts:
            counts[x] += 1

    n = len(stream)
    return [x for x in candidates if counts[x] > n // k]
```

```id="mg4"
func MisraGries[T comparable](stream []T, k int) []T {
	counters := map[T]int{}

	for _, x := range stream {
		if _, ok := counters[x]; ok {
			counters[x]++
			continue
		}

		if len(counters) < k-1 {
			counters[x] = 1
			continue
		}

		var remove []T
		for y := range counters {
			counters[y]--
			if counters[y] == 0 {
				remove = append(remove, y)
			}
		}

		for _, y := range remove {
			delete(counters, y)
		}
	}

	out := make([]T, 0, len(counters))
	for x := range counters {
		out = append(out, x)
	}

	return out
}
```

