# LeetCode 601: Human Traffic of Stadium

## Problem Restatement

We are given a table called `Stadium`.

Each row records one stadium visit day. The table has three columns:

| Column | Type | Meaning |
|---|---|---|
| `id` | int | Visit id |
| `visit_date` | date | Date of the visit |
| `people` | int | Number of people who visited |

We need to return all rows that are part of a group of at least three consecutive `id` values where every row has `people >= 100`.

The result should be ordered by `visit_date`.

The official statement asks for records with three or more consecutive ids, and each record in that consecutive group must have `people >= 100`.

## Input and Output

Input table:

```sql
Stadium
```

Output columns:

```sql
id, visit_date, people
```

The output should include only rows that belong to a valid consecutive group.

## Example

Input:

| id | visit_date | people |
|---:|---|---:|
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |

Rows with `people >= 100` are:

| id | visit_date | people |
|---:|---|---:|
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |

The consecutive high-traffic groups are:

| Group | ids | Length |
|---|---|---:|
| First | `2, 3` | 2 |
| Second | `5, 6, 7, 8` | 4 |

Only the second group has length at least `3`.

So the output is:

| id | visit_date | people |
|---:|---|---:|
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |

## First Thought: Self Join

One direct way is to check whether each row is part of a valid window of three rows.

A row can be valid in three ways:

| Case | Meaning |
|---|---|
| It is the first row of a valid triple | `id, id + 1, id + 2` |
| It is the middle row of a valid triple | `id - 1, id, id + 1` |
| It is the last row of a valid triple | `id - 2, id - 1, id` |

This can be solved with several self joins, but the query becomes verbose. It also repeats the same condition many times.

A cleaner solution is to first keep only high-traffic rows, then group consecutive ids.

## Key Insight

After filtering to rows where:

```sql
people >= 100
```

we need to find runs of consecutive `id` values.

For a consecutive sequence, this expression stays constant:

```sql
id - ROW_NUMBER() OVER (ORDER BY id)
```

Example:

| id | row_number | id - row_number |
|---:|---:|---:|
| 5 | 1 | 4 |
| 6 | 2 | 4 |
| 7 | 3 | 4 |
| 8 | 4 | 4 |

All four rows have the same value, so they belong to the same consecutive group.

If there is a gap, the value changes.

Example:

| id | row_number | id - row_number |
|---:|---:|---:|
| 2 | 1 | 1 |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 6 | 4 | 2 |

The gap between `3` and `5` creates a new group.

## Algorithm

First, filter the table to rows where `people >= 100`.

Then assign each remaining row a group key:

```sql
id - ROW_NUMBER() OVER (ORDER BY id)
```

Rows with the same group key form one consecutive run.

Next, count how many rows are in each group.

Finally, return only rows whose group has at least three rows.

## SQL Solution

```sql
WITH high_traffic AS (
    SELECT
        id,
        visit_date,
        people,
        id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
    FROM Stadium
    WHERE people >= 100
),
valid_groups AS (
    SELECT group_id
    FROM high_traffic
    GROUP BY group_id
    HAVING COUNT(*) >= 3
)
SELECT
    id,
    visit_date,
    people
FROM high_traffic
WHERE group_id IN (
    SELECT group_id
    FROM valid_groups
)
ORDER BY visit_date;
```

## Code Explanation

The first CTE keeps only rows that can possibly be part of the answer:

```sql
WITH high_traffic AS (
    SELECT
        id,
        visit_date,
        people,
        id - ROW_NUMBER() OVER (ORDER BY id) AS group_id
    FROM Stadium
    WHERE people >= 100
)
```

Rows with fewer than `100` people cannot be in any valid group, so we remove them before grouping.

The expression:

```sql
id - ROW_NUMBER() OVER (ORDER BY id)
```

creates the same `group_id` for rows whose ids are consecutive.

The second CTE keeps only groups with at least three rows:

```sql
valid_groups AS (
    SELECT group_id
    FROM high_traffic
    GROUP BY group_id
    HAVING COUNT(*) >= 3
)
```

The final query returns all rows from those valid groups:

```sql
SELECT
    id,
    visit_date,
    people
FROM high_traffic
WHERE group_id IN (
    SELECT group_id
    FROM valid_groups
)
ORDER BY visit_date;
```

The result is ordered by `visit_date`, as required.

## Correctness

After filtering, every row in `high_traffic` satisfies `people >= 100`.

For any consecutive run of ids, the value `id - ROW_NUMBER()` is constant because both `id` and `ROW_NUMBER()` increase by `1` at each step.

When a gap appears in the ids, `id` increases by more than `1`, but `ROW_NUMBER()` still increases by exactly `1`. Therefore, `id - ROW_NUMBER()` changes, creating a new group.

So each `group_id` represents exactly one consecutive run among rows where `people >= 100`.

The `valid_groups` CTE keeps only groups whose size is at least `3`.

Therefore, the final query returns exactly the rows that belong to a consecutive high-traffic group of length at least `3`.

## Complexity

Let `n` be the number of rows in `Stadium`.

| Metric | Value | Why |
|---|---:|---|
| Time | `O(n log n)` | The window function orders rows by `id` |
| Space | `O(n)` | The CTE may store filtered rows and group ids |

In practice, if `id` is indexed, the database can often process the ordering more efficiently.

