A SQL guide for finding stadium records that belong to runs of at least three consecutive ids where each row has at least 100 people.
Problem Restatement
We are given a table called Stadium.
Each row records one stadium visit day. The table has three columns:
| Column | Type | Meaning |
|---|---|---|
id | int | Visit id |
visit_date | date | Date of the visit |
people | int | Number of people who visited |
We need to return all rows that are part of a group of at least three consecutive id values where every row has people >= 100.
The result should be ordered by visit_date.
The official statement asks for records with three or more consecutive ids, and each record in that consecutive group must have people >= 100.
Input and Output
Input table:
StadiumOutput columns:
id, visit_date, peopleThe output should include only rows that belong to a valid consecutive group.
Example
Input:
| id | visit_date | people |
|---|---|---|
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |
Rows with people >= 100 are:
| id | visit_date | people |
|---|---|---|
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |
The consecutive high-traffic groups are:
| Group | ids | Length |
|---|---|---|
| First | 2, 3 | 2 |
| Second | 5, 6, 7, 8 | 4 |
Only the second group has length at least 3.
So the output is:
| id | visit_date | people |
|---|---|---|
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |
First Thought: Self Join
One direct way is to check whether each row is part of a valid window of three rows.
A row can be valid in three ways:
| Case | Meaning |
|---|---|
| It is the first row of a valid triple | id, id + 1, id + 2 |
| It is the middle row of a valid triple | id - 1, id, id + 1 |
| It is the last row of a valid triple | id - 2, id - 1, id |
This can be solved with several self joins, but the query becomes verbose. It also repeats the same condition many times.
A cleaner solution is to first keep only high-traffic rows, then group consecutive ids.
Key Insight
After filtering to rows where:
people >= 100we need to find runs of consecutive id values.
For a consecutive sequence, this expression stays constant:
id - ROW_NUMBER() OVER (ORDER BY id)Example:
| id | row_number | id - row_number |
|---|---|---|
| 5 | 1 | 4 |
| 6 | 2 | 4 |
| 7 | 3 | 4 |
| 8 | 4 | 4 |
All four rows have the same value, so they belong to the same consecutive group.
If there is a gap, the value changes.
Example:
| id | row_number | id - row_number |
|---|---|---|
| 2 | 1 | 1 |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 6 | 4 | 2 |
The gap between 3 and 5 creates a new group.
Algorithm
First, filter the table to rows where people >= 100.
Then assign each remaining row a group key:
id - ROW_NUMBER() OVER (ORDER BY id)Rows with the same group key form one consecutive run.
Next, count how many rows are in each group.
Finally, return only rows whose group has at least three rows.
SQL Solution
WITH high_traffic AS (
SELECT
id,
visit_date,
people,
id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
FROM Stadium
WHERE people >= 100
),
valid_groups AS (
SELECT group_id
FROM high_traffic
GROUP BY group_id
HAVING COUNT(*) >= 3
)
SELECT
id,
visit_date,
people
FROM high_traffic
WHERE group_id IN (
SELECT group_id
FROM valid_groups
)
ORDER BY visit_date;Code Explanation
The first CTE keeps only rows that can possibly be part of the answer:
WITH high_traffic AS (
SELECT
id,
visit_date,
people,
id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
FROM Stadium
WHERE people >= 100
)Rows with fewer than 100 people cannot be in any valid group, so we remove them before grouping.
The expression:
id - ROW_NUMBER() OVER (ORDER BY id)creates the same group_id for rows whose ids are consecutive.
The second CTE keeps only groups with at least three rows:
valid_groups AS (
SELECT group_id
FROM high_traffic
GROUP BY group_id
HAVING COUNT(*) >= 3
)The final query returns all rows from those valid groups:
SELECT
id,
visit_date,
people
FROM high_traffic
WHERE group_id IN (
SELECT group_id
FROM valid_groups
)
ORDER BY visit_date;The result is ordered by visit_date, as required.
Correctness
After filtering, every row in high_traffic satisfies people >= 100.
For any consecutive run of ids, the value id - ROW_NUMBER() is constant because both id and ROW_NUMBER() increase by 1 at each step.
When a gap appears in the ids, id increases by more than 1, but ROW_NUMBER() still increases by exactly 1. Therefore, id - ROW_NUMBER() changes, creating a new group.
So each group_id represents exactly one consecutive run among rows where people >= 100.
The valid_groups CTE keeps only groups whose size is at least 3.
Therefore, the final query returns exactly the rows that belong to a consecutive high-traffic group of length at least 3.
Complexity
Let n be the number of rows in Stadium.
| Metric | Value | Why |
|---|---|---|
| Time | O(n log n) | The window function orders rows by id |
| Space | O(n) | The CTE may store filtered rows and group ids |
In practice, if id is indexed, the database can often process the ordering more efficiently.