# Space Saving Algorithm

# Space Saving Algorithm

Space Saving tracks frequent items in a stream using a fixed number of counters, similar to Misra Gries, but with better accuracy in practice. It maintains estimates for item frequencies and guarantees bounded error.

Unlike Misra Gries, it does not decrement all counters. Instead, it replaces the smallest counter when needed.

## Problem

Given a stream

$$
x_1, x_2, \dots, x_n
$$

maintain approximate counts for the most frequent items using limited memory.

## Algorithm

Maintain at most $k$ counters, each storing a value and its estimated count.

```id="ss1"
space_saving(stream, k):
    counters = map from item to count

    for x in stream:
        if x in counters:
            counters[x] = counters[x] + 1
        else if size(counters) < k:
            counters[x] = 1
        else:
            y = item with minimum count
            remove y
            counters[x] = count(y) + 1

    return counters
```

The replaced item’s count becomes the new item's starting estimate.

## Key Idea

When a new unseen item arrives and all counters are full, the algorithm evicts the least frequent tracked item and assigns its count to the new item, incremented by one.

This reflects that the new item could have appeared before but was not tracked. The stored count is therefore an overestimate, but the error is bounded.

## Error Guarantee

For each tracked item $x$, let:

* $\hat{f}(x)$ be the estimated frequency
* $f(x)$ be the true frequency

Then:

$$
f(x) \le \hat{f}(x) \le f(x) + \epsilon n
$$

where:

$$
\epsilon = 1/k
$$

The error is at most the smallest counter value at the time of insertion.

## Example

Let

$$
stream = [a, b, a, c, a, d, b, a]
$$

with $k = 2$.

The algorithm maintains two counters and continuously replaces the least frequent item. The item $a$ accumulates the largest count and remains tracked.

## Correctness

Every increment reflects a true occurrence. When replacing an item, the algorithm assigns the minimum possible count that the new item could have, given that it was not tracked before.

Thus estimates never underestimate the true frequency. The overestimation is bounded by the number of times the minimum counter was replaced.

## Complexity

| measure          |          cost |
| ---------------- | ------------: |
| memory           |        $O(k)$ |
| update with heap |   $O(\log k)$ |
| update naive     |        $O(k)$ |
| total time       | $O(n \log k)$ |

Using a min heap keyed by count gives efficient updates.

## When to Use

Use Space Saving when:

* you need approximate heavy hitters
* memory must be strictly bounded
* tighter accuracy than Misra Gries is desired
* real-time updates are required

It is widely used in monitoring systems, telemetry, and network traffic analysis.

## Implementation

```id="ss2"
import heapq

class SpaceSaving:
    def __init__(self, k):
        self.k = k
        self.counts = {}
        self.heap = []  # (count, item)

    def add(self, x):
        if x in self.counts:
            self.counts[x] += 1
            heapq.heappush(self.heap, (self.counts[x], x))
            return

        if len(self.counts) < self.k:
            self.counts[x] = 1
            heapq.heappush(self.heap, (1, x))
            return

        # remove smallest valid entry
        while self.heap:
            count, y = heapq.heappop(self.heap)
            if self.counts.get(y) == count:
                break

        del self.counts[y]
        self.counts[x] = count + 1
        heapq.heappush(self.heap, (count + 1, x))

    def items(self):
        return dict(self.counts)
```

```id="ss3"
type SpaceSaving struct {
	k      int
	counts map[string]int
}

func NewSpaceSaving(k int) *SpaceSaving {
	return &SpaceSaving{
		k:      k,
		counts: map[string]int{},
	}
}

func (s *SpaceSaving) Add(x string) {
	if c, ok := s.counts[x]; ok {
		s.counts[x] = c + 1
		return
	}

	if len(s.counts) < s.k {
		s.counts[x] = 1
		return
	}

	// find minimum
	var minKey string
	minVal := int(^uint(0) >> 1)

	for y, c := range s.counts {
		if c < minVal {
			minVal = c
			minKey = y
		}
	}

	delete(s.counts, minKey)
	s.counts[x] = minVal + 1
}
```

