# LeetCode 295: Find Median from Data Stream

## Problem Restatement

We need to design a class called `MedianFinder`.

It supports two operations:

```python
addNum(num)
findMedian()
```

`addNum(num)` adds one integer from the data stream.

`findMedian()` returns the median of all numbers added so far.

The median is the middle value in a sorted list. If the list has an even number of values, the median is the average of the two middle values. The answer is accepted if it is within `10^-5` of the actual answer.

## Input and Output

| Operation | Meaning |
|---|---|
| `MedianFinder()` | Initializes the object |
| `addNum(num)` | Adds one integer |
| `findMedian()` | Returns the median of all inserted numbers |

Class shape:

```python
class MedianFinder:
    def __init__(self):
        ...

    def addNum(self, num: int) -> None:
        ...

    def findMedian(self) -> float:
        ...
```

## Examples

For these calls:

```python
medianFinder = MedianFinder()
medianFinder.addNum(1)
medianFinder.addNum(2)
medianFinder.findMedian()
medianFinder.addNum(3)
medianFinder.findMedian()
```

The internal sorted values change like this:

| Operation | Sorted Values | Median |
|---|---|---:|
| `addNum(1)` | `[1]` | |
| `addNum(2)` | `[1, 2]` | |
| `findMedian()` | `[1, 2]` | `1.5` |
| `addNum(3)` | `[1, 2, 3]` | |
| `findMedian()` | `[1, 2, 3]` | `2.0` |

So the outputs for the two median calls are:

```python
1.5
2.0
```

## First Thought: Store and Sort

A simple solution stores every number in a list.

When `findMedian()` is called, sort the list and read the middle.

```python
class MedianFinder:
    def __init__(self):
        self.nums = []

    def addNum(self, num: int) -> None:
        self.nums.append(num)

    def findMedian(self) -> float:
        nums = sorted(self.nums)
        n = len(nums)

        if n % 2 == 1:
            return float(nums[n // 2])

        return (nums[n // 2 - 1] + nums[n // 2]) / 2
```

This is correct, but sorting every time is expensive.

If there are many calls to `findMedian()`, the repeated sorting dominates the runtime.

## Problem With Sorting Every Time

Sorting costs:

```python
O(n log n)
```

per median query.

But the median only depends on the middle boundary of the sorted values.

We do not need the entire sorted list on every query.

We need a structure that keeps the lower half and upper half separated.

## Key Insight

Use two heaps.

| Heap | Stores | Top Gives |
|---|---|---|
| `small` | Smaller half of numbers | Largest number in smaller half |
| `large` | Larger half of numbers | Smallest number in larger half |

Python only has a min-heap, so we store negative values in `small` to simulate a max-heap.

The heaps maintain two rules:

1. Every value in `small` is less than or equal to every value in `large`.
2. The heap sizes differ by at most one.

Then the median is easy:

| Heap Sizes | Median |
|---|---|
| Same size | Average of both heap tops |
| `small` has one more | Top of `small` |
| `large` has one more | Top of `large` |

## Algorithm

To add a number:

1. Push it into `small`.
2. Move the largest value from `small` into `large`.
3. If `large` becomes larger than `small`, move the smallest value from `large` back to `small`.

This keeps `small` either the same size as `large`, or one element larger.

For `findMedian()`:

If both heaps have the same size:

```python
(-small[0] + large[0]) / 2
```

Otherwise:

```python
-small[0]
```

because `small` stores negative values.

## Correctness

The algorithm keeps the smaller half in `small` and the larger half in `large`.

When a number is added, it is first inserted into `small`. Then the largest value from `small` is moved to `large`. This guarantees that no value left in `small` is greater than the smallest candidate in `large`.

If `large` becomes larger than `small`, the smallest value from `large` is moved back to `small`. This restores the size rule while preserving ordering, because the moved value is the smallest value in the larger half.

After every insertion, the two heaps split all inserted values around the median. `small` contains the lower half, and `large` contains the upper half.

If the total count is odd, `small` has one extra value, and its top is the median.

If the total count is even, the median is the average of the largest value in the lower half and the smallest value in the upper half.

Therefore `findMedian()` returns the correct median.

## Complexity

| Operation | Time | Space | Why |
|---|---:|---:|---|
| `addNum` | `O(log n)` | `O(n)` | Heap push and pop |
| `findMedian` | `O(1)` | `O(n)` | Reads heap tops |
| Constructor | `O(1)` | `O(1)` | Initializes two heaps |

## Implementation

```python
import heapq

class MedianFinder:
    def __init__(self):
        self.small = []
        self.large = []

    def addNum(self, num: int) -> None:
        heapq.heappush(self.small, -num)

        largest_small = -heapq.heappop(self.small)
        heapq.heappush(self.large, largest_small)

        if len(self.large) > len(self.small):
            smallest_large = heapq.heappop(self.large)
            heapq.heappush(self.small, -smallest_large)

    def findMedian(self) -> float:
        if len(self.small) > len(self.large):
            return float(-self.small[0])

        return (-self.small[0] + self.large[0]) / 2
```

## Code Explanation

We use two heaps:

```python
self.small = []
self.large = []
```

`small` is a max-heap simulated with negative values.

`large` is a normal min-heap.

When adding a number, we push it into `small`:

```python
heapq.heappush(self.small, -num)
```

Then we move the largest value from `small` into `large`:

```python
largest_small = -heapq.heappop(self.small)
heapq.heappush(self.large, largest_small)
```

If `large` has more elements, we move one back:

```python
if len(self.large) > len(self.small):
    smallest_large = heapq.heappop(self.large)
    heapq.heappush(self.small, -smallest_large)
```

This keeps `small` at least as large as `large`, but never more than one element larger.

For the median:

```python
if len(self.small) > len(self.large):
    return float(-self.small[0])
```

If there is an odd number of values, the top of `small` is the middle.

Otherwise:

```python
return (-self.small[0] + self.large[0]) / 2
```

we average the two middle values.

## Testing

```python
def test_median_finder():
    mf = MedianFinder()

    mf.addNum(1)
    mf.addNum(2)
    assert mf.findMedian() == 1.5

    mf.addNum(3)
    assert mf.findMedian() == 2.0

    mf = MedianFinder()
    mf.addNum(-1)
    assert mf.findMedian() == -1.0

    mf.addNum(-2)
    assert mf.findMedian() == -1.5

    mf.addNum(-3)
    assert mf.findMedian() == -2.0

    mf = MedianFinder()
    for x in [5, 15, 1, 3]:
        mf.addNum(x)

    assert mf.findMedian() == 4.0

    print("all tests passed")

test_median_finder()
```

Test meaning:

| Test | Why |
|---|---|
| `1, 2` | Even count, average needed |
| `1, 2, 3` | Odd count, middle value |
| Negative numbers | Confirms max-heap negation works |
| `5, 15, 1, 3` | Unsorted insertion order |

