A two-heap data structure for adding numbers from a stream and returning the current median in constant time.
Problem Restatement
We need to design a class called MedianFinder.
It supports two operations:
addNum(num)
findMedian()addNum(num) adds one integer from the data stream.
findMedian() returns the median of all numbers added so far.
The median is the middle value in a sorted list. If the list has an even number of values, the median is the average of the two middle values. The answer is accepted if it is within 10^-5 of the actual answer.
Input and Output
| Operation | Meaning |
|---|---|
MedianFinder() | Initializes the object |
addNum(num) | Adds one integer |
findMedian() | Returns the median of all inserted numbers |
Class shape:
class MedianFinder:
def __init__(self):
...
def addNum(self, num: int) -> None:
...
def findMedian(self) -> float:
...Examples
For these calls:
medianFinder = MedianFinder()
medianFinder.addNum(1)
medianFinder.addNum(2)
medianFinder.findMedian()
medianFinder.addNum(3)
medianFinder.findMedian()The internal sorted values change like this:
| Operation | Sorted Values | Median |
|---|---|---|
addNum(1) | [1] | |
addNum(2) | [1, 2] | |
findMedian() | [1, 2] | 1.5 |
addNum(3) | [1, 2, 3] | |
findMedian() | [1, 2, 3] | 2.0 |
So the outputs for the two median calls are:
1.5
2.0First Thought: Store and Sort
A simple solution stores every number in a list.
When findMedian() is called, sort the list and read the middle.
class MedianFinder:
def __init__(self):
self.nums = []
def addNum(self, num: int) -> None:
self.nums.append(num)
def findMedian(self) -> float:
nums = sorted(self.nums)
n = len(nums)
if n % 2 == 1:
return float(nums[n // 2])
return (nums[n // 2 - 1] + nums[n // 2]) / 2This is correct, but sorting every time is expensive.
If there are many calls to findMedian(), the repeated sorting dominates the runtime.
Problem With Sorting Every Time
Sorting costs:
O(n log n)per median query.
But the median only depends on the middle boundary of the sorted values.
We do not need the entire sorted list on every query.
We need a structure that keeps the lower half and upper half separated.
Key Insight
Use two heaps.
| Heap | Stores | Top Gives |
|---|---|---|
small | Smaller half of numbers | Largest number in smaller half |
large | Larger half of numbers | Smallest number in larger half |
Python only has a min-heap, so we store negative values in small to simulate a max-heap.
The heaps maintain two rules:
- Every value in
smallis less than or equal to every value inlarge. - The heap sizes differ by at most one.
Then the median is easy:
| Heap Sizes | Median |
|---|---|
| Same size | Average of both heap tops |
small has one more | Top of small |
large has one more | Top of large |
Algorithm
To add a number:
- Push it into
small. - Move the largest value from
smallintolarge. - If
largebecomes larger thansmall, move the smallest value fromlargeback tosmall.
This keeps small either the same size as large, or one element larger.
For findMedian():
If both heaps have the same size:
(-small[0] + large[0]) / 2Otherwise:
-small[0]because small stores negative values.
Correctness
The algorithm keeps the smaller half in small and the larger half in large.
When a number is added, it is first inserted into small. Then the largest value from small is moved to large. This guarantees that no value left in small is greater than the smallest candidate in large.
If large becomes larger than small, the smallest value from large is moved back to small. This restores the size rule while preserving ordering, because the moved value is the smallest value in the larger half.
After every insertion, the two heaps split all inserted values around the median. small contains the lower half, and large contains the upper half.
If the total count is odd, small has one extra value, and its top is the median.
If the total count is even, the median is the average of the largest value in the lower half and the smallest value in the upper half.
Therefore findMedian() returns the correct median.
Complexity
| Operation | Time | Space | Why |
|---|---|---|---|
addNum | O(log n) | O(n) | Heap push and pop |
findMedian | O(1) | O(n) | Reads heap tops |
| Constructor | O(1) | O(1) | Initializes two heaps |
Implementation
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
def addNum(self, num: int) -> None:
heapq.heappush(self.small, -num)
largest_small = -heapq.heappop(self.small)
heapq.heappush(self.large, largest_small)
if len(self.large) > len(self.small):
smallest_large = heapq.heappop(self.large)
heapq.heappush(self.small, -smallest_large)
def findMedian(self) -> float:
if len(self.small) > len(self.large):
return float(-self.small[0])
return (-self.small[0] + self.large[0]) / 2Code Explanation
We use two heaps:
self.small = []
self.large = []small is a max-heap simulated with negative values.
large is a normal min-heap.
When adding a number, we push it into small:
heapq.heappush(self.small, -num)Then we move the largest value from small into large:
largest_small = -heapq.heappop(self.small)
heapq.heappush(self.large, largest_small)If large has more elements, we move one back:
if len(self.large) > len(self.small):
smallest_large = heapq.heappop(self.large)
heapq.heappush(self.small, -smallest_large)This keeps small at least as large as large, but never more than one element larger.
For the median:
if len(self.small) > len(self.large):
return float(-self.small[0])If there is an odd number of values, the top of small is the middle.
Otherwise:
return (-self.small[0] + self.large[0]) / 2we average the two middle values.
Testing
def test_median_finder():
mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
assert mf.findMedian() == 1.5
mf.addNum(3)
assert mf.findMedian() == 2.0
mf = MedianFinder()
mf.addNum(-1)
assert mf.findMedian() == -1.0
mf.addNum(-2)
assert mf.findMedian() == -1.5
mf.addNum(-3)
assert mf.findMedian() == -2.0
mf = MedianFinder()
for x in [5, 15, 1, 3]:
mf.addNum(x)
assert mf.findMedian() == 4.0
print("all tests passed")
test_median_finder()Test meaning:
| Test | Why |
|---|---|
1, 2 | Even count, average needed |
1, 2, 3 | Odd count, middle value |
| Negative numbers | Confirms max-heap negation works |
5, 15, 1, 3 | Unsorted insertion order |