Approximate frequent items in a stream with tighter error bounds than Misra Gries.
Space Saving tracks frequent items in a stream using a fixed number of counters, similar to Misra Gries, but with better accuracy in practice. It maintains estimates for item frequencies and guarantees bounded error.
Unlike Misra Gries, it does not decrement all counters. Instead, it replaces the smallest counter when needed.
Problem
Given a stream
maintain approximate counts for the most frequent items using limited memory.
Algorithm
Maintain at most counters, each storing a value and its estimated count.
space_saving(stream, k):
counters = map from item to count
for x in stream:
if x in counters:
counters[x] = counters[x] + 1
else if size(counters) < k:
counters[x] = 1
else:
y = item with minimum count
remove y
counters[x] = count(y) + 1
return countersThe replaced item’s count becomes the new item’s starting estimate.
Key Idea
When a new unseen item arrives and all counters are full, the algorithm evicts the least frequent tracked item and assigns its count to the new item, incremented by one.
This reflects that the new item could have appeared before but was not tracked. The stored count is therefore an overestimate, but the error is bounded.
Error Guarantee
For each tracked item , let:
- be the estimated frequency
- be the true frequency
Then:
where:
The error is at most the smallest counter value at the time of insertion.
Example
Let
with .
The algorithm maintains two counters and continuously replaces the least frequent item. The item accumulates the largest count and remains tracked.
Correctness
Every increment reflects a true occurrence. When replacing an item, the algorithm assigns the minimum possible count that the new item could have, given that it was not tracked before.
Thus estimates never underestimate the true frequency. The overestimation is bounded by the number of times the minimum counter was replaced.
Complexity
| measure | cost |
|---|---|
| memory | |
| update with heap | |
| update naive | |
| total time |
Using a min heap keyed by count gives efficient updates.
When to Use
Use Space Saving when:
- you need approximate heavy hitters
- memory must be strictly bounded
- tighter accuracy than Misra Gries is desired
- real-time updates are required
It is widely used in monitoring systems, telemetry, and network traffic analysis.
Implementation
import heapq
class SpaceSaving:
def __init__(self, k):
self.k = k
self.counts = {}
self.heap = [] # (count, item)
def add(self, x):
if x in self.counts:
self.counts[x] += 1
heapq.heappush(self.heap, (self.counts[x], x))
return
if len(self.counts) < self.k:
self.counts[x] = 1
heapq.heappush(self.heap, (1, x))
return
# remove smallest valid entry
while self.heap:
count, y = heapq.heappop(self.heap)
if self.counts.get(y) == count:
break
del self.counts[y]
self.counts[x] = count + 1
heapq.heappush(self.heap, (count + 1, x))
def items(self):
return dict(self.counts)type SpaceSaving struct {
k int
counts map[string]int
}
func NewSpaceSaving(k int) *SpaceSaving {
return &SpaceSaving{
k: k,
counts: map[string]int{},
}
}
func (s *SpaceSaving) Add(x string) {
if c, ok := s.counts[x]; ok {
s.counts[x] = c + 1
return
}
if len(s.counts) < s.k {
s.counts[x] = 1
return
}
// find minimum
var minKey string
minVal := int(^uint(0) >> 1)
for y, c := range s.counts {
if c < minVal {
minVal = c
minKey = y
}
}
delete(s.counts, minKey)
s.counts[x] = minVal + 1
}