Skip to content

LeetCode 601: Human Traffic of Stadium

A SQL guide for finding stadium records that belong to runs of at least three consecutive ids where each row has at least 100 people.

Problem Restatement

We are given a table called Stadium.

Each row records one stadium visit day. The table has three columns:

ColumnTypeMeaning
idintVisit id
visit_datedateDate of the visit
peopleintNumber of people who visited

We need to return all rows that are part of a group of at least three consecutive id values where every row has people >= 100.

The result should be ordered by visit_date.

The official statement asks for records with three or more consecutive ids, and each record in that consecutive group must have people >= 100.

Input and Output

Input table:

Stadium

Output columns:

id, visit_date, people

The output should include only rows that belong to a valid consecutive group.

Example

Input:

idvisit_datepeople
12017-01-0110
22017-01-02109
32017-01-03150
42017-01-0499
52017-01-05145
62017-01-061455
72017-01-07199
82017-01-08188

Rows with people >= 100 are:

idvisit_datepeople
22017-01-02109
32017-01-03150
52017-01-05145
62017-01-061455
72017-01-07199
82017-01-08188

The consecutive high-traffic groups are:

GroupidsLength
First2, 32
Second5, 6, 7, 84

Only the second group has length at least 3.

So the output is:

idvisit_datepeople
52017-01-05145
62017-01-061455
72017-01-07199
82017-01-08188

First Thought: Self Join

One direct way is to check whether each row is part of a valid window of three rows.

A row can be valid in three ways:

CaseMeaning
It is the first row of a valid tripleid, id + 1, id + 2
It is the middle row of a valid tripleid - 1, id, id + 1
It is the last row of a valid tripleid - 2, id - 1, id

This can be solved with several self joins, but the query becomes verbose. It also repeats the same condition many times.

A cleaner solution is to first keep only high-traffic rows, then group consecutive ids.

Key Insight

After filtering to rows where:

people >= 100

we need to find runs of consecutive id values.

For a consecutive sequence, this expression stays constant:

id - ROW_NUMBER() OVER (ORDER BY id)

Example:

idrow_numberid - row_number
514
624
734
844

All four rows have the same value, so they belong to the same consecutive group.

If there is a gap, the value changes.

Example:

idrow_numberid - row_number
211
321
532
642

The gap between 3 and 5 creates a new group.

Algorithm

First, filter the table to rows where people >= 100.

Then assign each remaining row a group key:

id - ROW_NUMBER() OVER (ORDER BY id)

Rows with the same group key form one consecutive run.

Next, count how many rows are in each group.

Finally, return only rows whose group has at least three rows.

SQL Solution

WITH high_traffic AS (
    SELECT
        id,
        visit_date,
        people,
        id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
    FROM Stadium
    WHERE people >= 100
),
valid_groups AS (
    SELECT group_id
    FROM high_traffic
    GROUP BY group_id
    HAVING COUNT(*) >= 3
)
SELECT
    id,
    visit_date,
    people
FROM high_traffic
WHERE group_id IN (
    SELECT group_id
    FROM valid_groups
)
ORDER BY visit_date;

Code Explanation

The first CTE keeps only rows that can possibly be part of the answer:

WITH high_traffic AS (
    SELECT
        id,
        visit_date,
        people,
        id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
    FROM Stadium
    WHERE people >= 100
)

Rows with fewer than 100 people cannot be in any valid group, so we remove them before grouping.

The expression:

id - ROW_NUMBER() OVER (ORDER BY id)

creates the same group_id for rows whose ids are consecutive.

The second CTE keeps only groups with at least three rows:

valid_groups AS (
    SELECT group_id
    FROM high_traffic
    GROUP BY group_id
    HAVING COUNT(*) >= 3
)

The final query returns all rows from those valid groups:

SELECT
    id,
    visit_date,
    people
FROM high_traffic
WHERE group_id IN (
    SELECT group_id
    FROM valid_groups
)
ORDER BY visit_date;

The result is ordered by visit_date, as required.

Correctness

After filtering, every row in high_traffic satisfies people >= 100.

For any consecutive run of ids, the value id - ROW_NUMBER() is constant because both id and ROW_NUMBER() increase by 1 at each step.

When a gap appears in the ids, id increases by more than 1, but ROW_NUMBER() still increases by exactly 1. Therefore, id - ROW_NUMBER() changes, creating a new group.

So each group_id represents exactly one consecutive run among rows where people >= 100.

The valid_groups CTE keeps only groups whose size is at least 3.

Therefore, the final query returns exactly the rows that belong to a consecutive high-traffic group of length at least 3.

Complexity

Let n be the number of rows in Stadium.

MetricValueWhy
TimeO(n log n)The window function orders rows by id
SpaceO(n)The CTE may store filtered rows and group ids

In practice, if id is indexed, the database can often process the ordering more efficiently.