# LeetCode 571: Find Median Given Frequency of Numbers

## Problem Restatement

We are given a `Numbers` table.

| Column | Type | Meaning |
|---|---|---|
| `num` | int | The number value |
| `frequency` | int | How many times this number appears |

Each row stores a number and its frequency.

Conceptually, the table represents a decompressed list of numbers. For example, if `num = 0` and `frequency = 7`, then `0` appears seven times in the full list.

We need to report the median of all numbers after decompression.

The result column must be named:

```sql
median
```

The LeetCode statement asks to round the median to one decimal point.

## Input and Output

| Item | Meaning |
|---|---|
| Input | `Numbers` table |
| Output | Median of the decompressed numbers |
| Output column | `median` |
| Avoid | Expanding frequency rows into many rows |

Expected output:

```sql
median
```

## Example

Input:

| num | frequency |
|---:|---:|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |

The decompressed list is:

```text
0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3
```

There are `12` values.

The middle positions are `6` and `7` if we use 1-based indexing.

Both values are `0`.

So the median is:

```text
0.0
```

Output:

| median |
|---:|
| 0.0 |

## First Thought: Decompress the Table

A direct idea is to expand every row into `frequency` rows.

For example:

| num | frequency |
|---:|---:|
| 2 | 3 |

would become:

| value |
|---:|
| 2 |
| 2 |
| 2 |

Then we could sort the expanded rows and find the middle value.

This is easy to understand, but it is not a good database solution. Frequencies can be large, so the expanded table may be much larger than the original table.

We should compute the median from the frequency distribution directly.

## Key Insight

After sorting by `num`, each row covers a range of positions in the decompressed list.

For the example:

| num | frequency | Position range |
|---:|---:|---|
| 0 | 7 | `1` through `7` |
| 1 | 1 | `8` through `8` |
| 2 | 3 | `9` through `11` |
| 3 | 1 | `12` through `12` |

The median position or positions must fall inside one or two of these ranges.

So the task becomes:

1. Compute cumulative frequency by ascending `num`.
2. Compute the total frequency.
3. Select the row or rows whose cumulative range contains the median position.
4. Average their `num` values.

## Cumulative Frequency

For each row, define:

```sql
running_frequency = SUM(frequency) OVER (ORDER BY num)
```

This is the last decompressed position occupied by the current `num`.

The first position occupied by the current row is:

```sql
running_frequency - frequency + 1
```

So the row covers:

```sql
running_frequency - frequency + 1
through
running_frequency
```

## Median Position Rule

Let:

```sql
total = SUM(frequency) OVER ()
```

For an odd total, there is one median position.

For an even total, there are two middle positions.

A compact SQL trick is to select rows where:

```sql
total / 2.0 BETWEEN running_frequency - frequency AND running_frequency
```

This works because the boundary expression identifies the row or rows crossing the middle of the distribution.

Another common equivalent approach computes cumulative sums from both directions and keeps rows that overlap the center. The Doocs solution uses ascending and descending cumulative frequencies for this idea.

## Algorithm

1. Sort rows by `num`.
2. Compute:
   1. Running cumulative frequency.
   2. Total frequency.
3. Keep rows whose frequency range contains the middle of the decompressed data.
4. Return the average of those `num` values.
5. Round to one decimal point.

## Correctness

Sorting by `num` gives the same order as the decompressed sorted list.

For each row, the cumulative frequency tells us the final position of that number in the decompressed sorted list. Subtracting `frequency` gives the position just before that number's block begins.

The median is determined only by the middle position or two middle positions in this sorted decompressed list.

The query keeps exactly the number row or rows whose cumulative frequency interval crosses the middle. If the total count is odd, one row is selected. If the total count is even and the two middle values are different, two rows are selected. If the two middle values are the same, one row may still represent both middle positions.

Averaging the selected `num` values therefore gives the median. Rounding formats it as required.

## Complexity

Let `n` be the number of rows in `Numbers`.

| Metric | Value | Why |
|---|---|---|
| Time | `O(n log n)` | The rows are sorted by `num` for the window function |
| Space | `O(n)` | The database may materialize window results |

This avoids creating `SUM(frequency)` decompressed rows.

## Implementation

```sql
WITH NumbersMetadata AS (
    SELECT
        num,
        frequency,
        SUM(frequency) OVER (
            ORDER BY num
        ) AS running_frequency,
        SUM(frequency) OVER () AS total_frequency
    FROM Numbers
)
SELECT
    ROUND(AVG(num), 1) AS median
FROM NumbersMetadata
WHERE total_frequency / 2.0
      BETWEEN running_frequency - frequency
          AND running_frequency;
```

## Code Explanation

The CTE computes metadata for every number:

```sql
WITH NumbersMetadata AS (...)
```

The running frequency gives the right edge of the number's interval in the decompressed sorted list:

```sql
SUM(frequency) OVER (
    ORDER BY num
) AS running_frequency
```

The total frequency gives the decompressed list length:

```sql
SUM(frequency) OVER () AS total_frequency
```

The `WHERE` clause keeps the row or rows that contain the median boundary:

```sql
WHERE total_frequency / 2.0
      BETWEEN running_frequency - frequency
          AND running_frequency
```

Finally, the query averages the selected values and rounds the result:

```sql
ROUND(AVG(num), 1) AS median
```

## Alternative Two-Direction Window Solution

This version computes cumulative frequency from both sides.

```sql
WITH t AS (
    SELECT
        num,
        frequency,
        SUM(frequency) OVER (ORDER BY num ASC) AS left_count,
        SUM(frequency) OVER (ORDER BY num DESC) AS right_count,
        SUM(frequency) OVER () AS total_count
    FROM Numbers
)
SELECT
    ROUND(AVG(num), 1) AS median
FROM t
WHERE left_count >= total_count / 2
  AND right_count >= total_count / 2;
```

The condition keeps numbers that lie across the center of the decompressed sorted distribution.

## Testing

Create sample data:

```sql
CREATE TABLE Numbers (
    num INT PRIMARY KEY,
    frequency INT
);

INSERT INTO Numbers (num, frequency) VALUES
(0, 7),
(1, 1),
(2, 3),
(3, 1);
```

Expected result:

| median |
|---:|
| 0.0 |

Additional cases:

| Numbers table | Expected median | Why |
|---|---:|---|
| `(1, 1), (2, 1), (3, 1)` | `2.0` | Odd total count |
| `(1, 1), (2, 1)` | `1.5` | Even count, two different middle values |
| `(1, 2), (2, 2)` | `1.5` | Middle positions split across two values |
| `(5, 10)` | `5.0` | Only one distinct number |
| `(1, 1), (10, 1), (100, 1), (1000, 1)` | `55.0` | Average of `10` and `100` |