# Median Maintenance by Two Heaps

# Median Maintenance by Two Heaps

Median Maintenance keeps track of the median as values arrive one by one. It uses two heaps to maintain a balanced partition of the data.

* A max heap stores the lower half
* A min heap stores the upper half

The median is derived from the roots of these heaps.

## Problem

Given a stream

$$
x_1, x_2, \dots, x_n
$$

after each insertion, report the median of all values seen so far.

## Algorithm

Maintain two heaps:

* $L$: max heap for smaller half
* $R$: min heap for larger half

Keep sizes balanced so that:

$$
|,|L| - |R|,| \le 1
$$

```id="mm1"
median_maintenance():
    L = max heap
    R = min heap

    for each x:
        if L is empty or x <= max(L):
            push x into L
        else:
            push x into R

        rebalance L and R

        if sizes equal:
            median = (max(L) + min(R)) / 2
        else:
            median = root of larger heap
```

## Rebalancing

Ensure the size condition holds:

```id="mm2"
rebalance():
    if size(L) > size(R) + 1:
        move max(L) to R
    else if size(R) > size(L) + 1:
        move min(R) to L
```

## Example

Stream:

$$
[7, 2, 9, 4, 3]
$$

Step by step medians:

| step | values seen     | median |
| ---- | --------------- | ------ |
| 1    | [7]             | 7      |
| 2    | [2, 7]          | 4.5    |
| 3    | [2, 7, 9]       | 7      |
| 4    | [2, 4, 7, 9]    | 5.5    |
| 5    | [2, 3, 4, 7, 9] | 4      |

## Correctness

The two heaps partition the data such that:

* every element in $L$ is less than or equal to every element in $R$
* the sizes differ by at most one

Thus the median lies at the boundary between the two heaps. If the total number of elements is odd, the median is the root of the larger heap. If even, it is the average of both roots.

## Complexity

| operation    |        cost |
| ------------ | ----------: |
| insertion    | $O(\log n)$ |
| median query |      $O(1)$ |
| memory       |      $O(n)$ |

Each insertion requires heap operations and possible rebalancing.

## When to Use

Use Median Maintenance when:

* values arrive as a stream
* the median must be updated online
* exact results are required

For approximate medians with less memory, use quantile sketches such as Greenwald Khanna.

## Implementation

```id="mm3"
import heapq

class MedianMaintenance:
    def __init__(self):
        self.low = []   # max heap (store negatives)
        self.high = []  # min heap

    def add(self, x):
        if not self.low or x <= -self.low[0]:
            heapq.heappush(self.low, -x)
        else:
            heapq.heappush(self.high, x)

        if len(self.low) > len(self.high) + 1:
            heapq.heappush(self.high, -heapq.heappop(self.low))
        elif len(self.high) > len(self.low) + 1:
            heapq.heappush(self.low, -heapq.heappop(self.high))

    def median(self):
        if len(self.low) == len(self.high):
            return (-self.low[0] + self.high[0]) / 2
        elif len(self.low) > len(self.high):
            return -self.low[0]
        else:
            return self.high[0]
```

```id="mm4"
package main

import "container/heap"

type MedianMaintenance struct {
	low  *MaxHeap
	high *MinHeap
}

func NewMedianMaintenance() *MedianMaintenance {
	l := &MaxHeap{}
	h := &MinHeap{}
	heap.Init(l)
	heap.Init(h)

	return &MedianMaintenance{
		low:  l,
		high: h,
	}
}

func (m *MedianMaintenance) Add(x int) {
	if m.low.Len() == 0 || x <= (*m.low)[0] {
		heap.Push(m.low, x)
	} else {
		heap.Push(m.high, x)
	}

	if m.low.Len() > m.high.Len()+1 {
		heap.Push(m.high, heap.Pop(m.low))
	} else if m.high.Len() > m.low.Len()+1 {
		heap.Push(m.low, heap.Pop(m.high))
	}
}

func (m *MedianMaintenance) Median() float64 {
	if m.low.Len() == m.high.Len() {
		return float64((*m.low)[0]+(*m.high)[0]) / 2
	} else if m.low.Len() > m.high.Len() {
		return float64((*m.low)[0])
	}
	return float64((*m.high)[0])
}
```

