Median Maintenance keeps track of the median as values arrive one by one. It uses two heaps to maintain a balanced partition of the data.
- A max heap stores the lower half
- A min heap stores the upper half
The median is derived from the roots of these heaps.
Problem
Given a stream
after each insertion, report the median of all values seen so far.
Algorithm
Maintain two heaps:
- : max heap for smaller half
- : min heap for larger half
Keep sizes balanced so that:
median_maintenance():
L = max heap
R = min heap
for each x:
if L is empty or x <= max(L):
push x into L
else:
push x into R
rebalance L and R
if sizes equal:
median = (max(L) + min(R)) / 2
else:
median = root of larger heapRebalancing
Ensure the size condition holds:
rebalance():
if size(L) > size(R) + 1:
move max(L) to R
else if size(R) > size(L) + 1:
move min(R) to LExample
Stream:
Step by step medians:
| step | values seen | median |
|---|---|---|
| 1 | [7] | 7 |
| 2 | [2, 7] | 4.5 |
| 3 | [2, 7, 9] | 7 |
| 4 | [2, 4, 7, 9] | 5.5 |
| 5 | [2, 3, 4, 7, 9] | 4 |
Correctness
The two heaps partition the data such that:
- every element in is less than or equal to every element in
- the sizes differ by at most one
Thus the median lies at the boundary between the two heaps. If the total number of elements is odd, the median is the root of the larger heap. If even, it is the average of both roots.
Complexity
| operation | cost |
|---|---|
| insertion | |
| median query | |
| memory |
Each insertion requires heap operations and possible rebalancing.
When to Use
Use Median Maintenance when:
- values arrive as a stream
- the median must be updated online
- exact results are required
For approximate medians with less memory, use quantile sketches such as Greenwald Khanna.
Implementation
import heapq
class MedianMaintenance:
def __init__(self):
self.low = [] # max heap (store negatives)
self.high = [] # min heap
def add(self, x):
if not self.low or x <= -self.low[0]:
heapq.heappush(self.low, -x)
else:
heapq.heappush(self.high, x)
if len(self.low) > len(self.high) + 1:
heapq.heappush(self.high, -heapq.heappop(self.low))
elif len(self.high) > len(self.low) + 1:
heapq.heappush(self.low, -heapq.heappop(self.high))
def median(self):
if len(self.low) == len(self.high):
return (-self.low[0] + self.high[0]) / 2
elif len(self.low) > len(self.high):
return -self.low[0]
else:
return self.high[0]package main
import "container/heap"
type MedianMaintenance struct {
low *MaxHeap
high *MinHeap
}
func NewMedianMaintenance() *MedianMaintenance {
l := &MaxHeap{}
h := &MinHeap{}
heap.Init(l)
heap.Init(h)
return &MedianMaintenance{
low: l,
high: h,
}
}
func (m *MedianMaintenance) Add(x int) {
if m.low.Len() == 0 || x <= (*m.low)[0] {
heap.Push(m.low, x)
} else {
heap.Push(m.high, x)
}
if m.low.Len() > m.high.Len()+1 {
heap.Push(m.high, heap.Pop(m.low))
} else if m.high.Len() > m.low.Len()+1 {
heap.Push(m.low, heap.Pop(m.high))
}
}
func (m *MedianMaintenance) Median() float64 {
if m.low.Len() == m.high.Len() {
return float64((*m.low)[0]+(*m.high)[0]) / 2
} else if m.low.Len() > m.high.Len() {
return float64((*m.low)[0])
}
return float64((*m.high)[0])
}