Skip to content

LeetCode 571: Find Median Given Frequency of Numbers

A clear explanation of Find Median Given Frequency of Numbers using cumulative frequency and SQL window functions.

Problem Restatement

We are given a Numbers table.

ColumnTypeMeaning
numintThe number value
frequencyintHow many times this number appears

Each row stores a number and its frequency.

Conceptually, the table represents a decompressed list of numbers. For example, if num = 0 and frequency = 7, then 0 appears seven times in the full list.

We need to report the median of all numbers after decompression.

The result column must be named:

median

The LeetCode statement asks to round the median to one decimal point.

Input and Output

ItemMeaning
InputNumbers table
OutputMedian of the decompressed numbers
Output columnmedian
AvoidExpanding frequency rows into many rows

Expected output:

median

Example

Input:

numfrequency
07
11
23
31

The decompressed list is:

0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3

There are 12 values.

The middle positions are 6 and 7 if we use 1-based indexing.

Both values are 0.

So the median is:

0.0

Output:

median
0.0

First Thought: Decompress the Table

A direct idea is to expand every row into frequency rows.

For example:

numfrequency
23

would become:

value
2
2
2

Then we could sort the expanded rows and find the middle value.

This is easy to understand, but it is not a good database solution. Frequencies can be large, so the expanded table may be much larger than the original table.

We should compute the median from the frequency distribution directly.

Key Insight

After sorting by num, each row covers a range of positions in the decompressed list.

For the example:

numfrequencyPosition range
071 through 7
118 through 8
239 through 11
3112 through 12

The median position or positions must fall inside one or two of these ranges.

So the task becomes:

  1. Compute cumulative frequency by ascending num.
  2. Compute the total frequency.
  3. Select the row or rows whose cumulative range contains the median position.
  4. Average their num values.

Cumulative Frequency

For each row, define:

running_frequency = SUM(frequency) OVER (ORDER BY num)

This is the last decompressed position occupied by the current num.

The first position occupied by the current row is:

running_frequency - frequency + 1

So the row covers:

running_frequency - frequency + 1
through
running_frequency

Median Position Rule

Let:

total = SUM(frequency) OVER ()

For an odd total, there is one median position.

For an even total, there are two middle positions.

A compact SQL trick is to select rows where:

total / 2.0 BETWEEN running_frequency - frequency AND running_frequency

This works because the boundary expression identifies the row or rows crossing the middle of the distribution.

Another common equivalent approach computes cumulative sums from both directions and keeps rows that overlap the center. The Doocs solution uses ascending and descending cumulative frequencies for this idea.

Algorithm

  1. Sort rows by num.
  2. Compute:
    1. Running cumulative frequency.
    2. Total frequency.
  3. Keep rows whose frequency range contains the middle of the decompressed data.
  4. Return the average of those num values.
  5. Round to one decimal point.

Correctness

Sorting by num gives the same order as the decompressed sorted list.

For each row, the cumulative frequency tells us the final position of that number in the decompressed sorted list. Subtracting frequency gives the position just before that number’s block begins.

The median is determined only by the middle position or two middle positions in this sorted decompressed list.

The query keeps exactly the number row or rows whose cumulative frequency interval crosses the middle. If the total count is odd, one row is selected. If the total count is even and the two middle values are different, two rows are selected. If the two middle values are the same, one row may still represent both middle positions.

Averaging the selected num values therefore gives the median. Rounding formats it as required.

Complexity

Let n be the number of rows in Numbers.

MetricValueWhy
TimeO(n log n)The rows are sorted by num for the window function
SpaceO(n)The database may materialize window results

This avoids creating SUM(frequency) decompressed rows.

Implementation

WITH NumbersMetadata AS (
    SELECT
        num,
        frequency,
        SUM(frequency) OVER (
            ORDER BY num
        ) AS running_frequency,
        SUM(frequency) OVER () AS total_frequency
    FROM Numbers
)
SELECT
    ROUND(AVG(num), 1) AS median
FROM NumbersMetadata
WHERE total_frequency / 2.0
      BETWEEN running_frequency - frequency
          AND running_frequency;

Code Explanation

The CTE computes metadata for every number:

WITH NumbersMetadata AS (...)

The running frequency gives the right edge of the number’s interval in the decompressed sorted list:

SUM(frequency) OVER (
    ORDER BY num
) AS running_frequency

The total frequency gives the decompressed list length:

SUM(frequency) OVER () AS total_frequency

The WHERE clause keeps the row or rows that contain the median boundary:

WHERE total_frequency / 2.0
      BETWEEN running_frequency - frequency
          AND running_frequency

Finally, the query averages the selected values and rounds the result:

ROUND(AVG(num), 1) AS median

Alternative Two-Direction Window Solution

This version computes cumulative frequency from both sides.

WITH t AS (
    SELECT
        num,
        frequency,
        SUM(frequency) OVER (ORDER BY num ASC) AS left_count,
        SUM(frequency) OVER (ORDER BY num DESC) AS right_count,
        SUM(frequency) OVER () AS total_count
    FROM Numbers
)
SELECT
    ROUND(AVG(num), 1) AS median
FROM t
WHERE left_count >= total_count / 2
  AND right_count >= total_count / 2;

The condition keeps numbers that lie across the center of the decompressed sorted distribution.

Testing

Create sample data:

CREATE TABLE Numbers (
    num INT PRIMARY KEY,
    frequency INT
);

INSERT INTO Numbers (num, frequency) VALUES
(0, 7),
(1, 1),
(2, 3),
(3, 1);

Expected result:

median
0.0

Additional cases:

Numbers tableExpected medianWhy
(1, 1), (2, 1), (3, 1)2.0Odd total count
(1, 1), (2, 1)1.5Even count, two different middle values
(1, 2), (2, 2)1.5Middle positions split across two values
(5, 10)5.0Only one distinct number
(1, 1), (10, 1), (100, 1), (1000, 1)55.0Average of 10 and 100