Skip to content

LeetCode 295: Find Median from Data Stream

A two-heap data structure for adding numbers from a stream and returning the current median in constant time.

Problem Restatement

We need to design a class called MedianFinder.

It supports two operations:

addNum(num)
findMedian()

addNum(num) adds one integer from the data stream.

findMedian() returns the median of all numbers added so far.

The median is the middle value in a sorted list. If the list has an even number of values, the median is the average of the two middle values. The answer is accepted if it is within 10^-5 of the actual answer.

Input and Output

OperationMeaning
MedianFinder()Initializes the object
addNum(num)Adds one integer
findMedian()Returns the median of all inserted numbers

Class shape:

class MedianFinder:
    def __init__(self):
        ...

    def addNum(self, num: int) -> None:
        ...

    def findMedian(self) -> float:
        ...

Examples

For these calls:

medianFinder = MedianFinder()
medianFinder.addNum(1)
medianFinder.addNum(2)
medianFinder.findMedian()
medianFinder.addNum(3)
medianFinder.findMedian()

The internal sorted values change like this:

OperationSorted ValuesMedian
addNum(1)[1]
addNum(2)[1, 2]
findMedian()[1, 2]1.5
addNum(3)[1, 2, 3]
findMedian()[1, 2, 3]2.0

So the outputs for the two median calls are:

1.5
2.0

First Thought: Store and Sort

A simple solution stores every number in a list.

When findMedian() is called, sort the list and read the middle.

class MedianFinder:
    def __init__(self):
        self.nums = []

    def addNum(self, num: int) -> None:
        self.nums.append(num)

    def findMedian(self) -> float:
        nums = sorted(self.nums)
        n = len(nums)

        if n % 2 == 1:
            return float(nums[n // 2])

        return (nums[n // 2 - 1] + nums[n // 2]) / 2

This is correct, but sorting every time is expensive.

If there are many calls to findMedian(), the repeated sorting dominates the runtime.

Problem With Sorting Every Time

Sorting costs:

O(n log n)

per median query.

But the median only depends on the middle boundary of the sorted values.

We do not need the entire sorted list on every query.

We need a structure that keeps the lower half and upper half separated.

Key Insight

Use two heaps.

HeapStoresTop Gives
smallSmaller half of numbersLargest number in smaller half
largeLarger half of numbersSmallest number in larger half

Python only has a min-heap, so we store negative values in small to simulate a max-heap.

The heaps maintain two rules:

  1. Every value in small is less than or equal to every value in large.
  2. The heap sizes differ by at most one.

Then the median is easy:

Heap SizesMedian
Same sizeAverage of both heap tops
small has one moreTop of small
large has one moreTop of large

Algorithm

To add a number:

  1. Push it into small.
  2. Move the largest value from small into large.
  3. If large becomes larger than small, move the smallest value from large back to small.

This keeps small either the same size as large, or one element larger.

For findMedian():

If both heaps have the same size:

(-small[0] + large[0]) / 2

Otherwise:

-small[0]

because small stores negative values.

Correctness

The algorithm keeps the smaller half in small and the larger half in large.

When a number is added, it is first inserted into small. Then the largest value from small is moved to large. This guarantees that no value left in small is greater than the smallest candidate in large.

If large becomes larger than small, the smallest value from large is moved back to small. This restores the size rule while preserving ordering, because the moved value is the smallest value in the larger half.

After every insertion, the two heaps split all inserted values around the median. small contains the lower half, and large contains the upper half.

If the total count is odd, small has one extra value, and its top is the median.

If the total count is even, the median is the average of the largest value in the lower half and the smallest value in the upper half.

Therefore findMedian() returns the correct median.

Complexity

OperationTimeSpaceWhy
addNumO(log n)O(n)Heap push and pop
findMedianO(1)O(n)Reads heap tops
ConstructorO(1)O(1)Initializes two heaps

Implementation

import heapq

class MedianFinder:
    def __init__(self):
        self.small = []
        self.large = []

    def addNum(self, num: int) -> None:
        heapq.heappush(self.small, -num)

        largest_small = -heapq.heappop(self.small)
        heapq.heappush(self.large, largest_small)

        if len(self.large) > len(self.small):
            smallest_large = heapq.heappop(self.large)
            heapq.heappush(self.small, -smallest_large)

    def findMedian(self) -> float:
        if len(self.small) > len(self.large):
            return float(-self.small[0])

        return (-self.small[0] + self.large[0]) / 2

Code Explanation

We use two heaps:

self.small = []
self.large = []

small is a max-heap simulated with negative values.

large is a normal min-heap.

When adding a number, we push it into small:

heapq.heappush(self.small, -num)

Then we move the largest value from small into large:

largest_small = -heapq.heappop(self.small)
heapq.heappush(self.large, largest_small)

If large has more elements, we move one back:

if len(self.large) > len(self.small):
    smallest_large = heapq.heappop(self.large)
    heapq.heappush(self.small, -smallest_large)

This keeps small at least as large as large, but never more than one element larger.

For the median:

if len(self.small) > len(self.large):
    return float(-self.small[0])

If there is an odd number of values, the top of small is the middle.

Otherwise:

return (-self.small[0] + self.large[0]) / 2

we average the two middle values.

Testing

def test_median_finder():
    mf = MedianFinder()

    mf.addNum(1)
    mf.addNum(2)
    assert mf.findMedian() == 1.5

    mf.addNum(3)
    assert mf.findMedian() == 2.0

    mf = MedianFinder()
    mf.addNum(-1)
    assert mf.findMedian() == -1.0

    mf.addNum(-2)
    assert mf.findMedian() == -1.5

    mf.addNum(-3)
    assert mf.findMedian() == -2.0

    mf = MedianFinder()
    for x in [5, 15, 1, 3]:
        mf.addNum(x)

    assert mf.findMedian() == 4.0

    print("all tests passed")

test_median_finder()

Test meaning:

TestWhy
1, 2Even count, average needed
1, 2, 3Odd count, middle value
Negative numbersConfirms max-heap negation works
5, 15, 1, 3Unsorted insertion order