A clear explanation of computing cumulative games played per player and date using SQL window functions.
Problem Restatement
We are given a table named Activity.
Each row records one player’s activity on one date. It contains the player id, device id, event date, and number of games played.
We need to report, for each player and each date, the total number of games the player has played so far up to that date.
The output should contain:
player_id
event_date
games_played_so_farThe result may be returned in any order. The usual readable order is by player_id and event_date.
Table Schema
| Column | Type | Meaning |
|---|---|---|
player_id | int | The player |
device_id | int | The device used |
event_date | date | The login date |
games_played | int | Games played on that date |
The primary key is:
(player_id, event_date)So each player has at most one row per date.
Examples
Input:
Activity
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 1 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+For player 1:
| Date | Games on Date | Games So Far |
|---|---|---|
2016-03-01 | 5 | 5 |
2016-05-02 | 6 | 5 + 6 = 11 |
2017-06-25 | 1 | 5 + 6 + 1 = 12 |
For player 3:
| Date | Games on Date | Games So Far |
|---|---|---|
2016-03-02 | 0 | 0 |
2018-07-03 | 5 | 0 + 5 = 5 |
Output:
+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1 | 2016-03-01 | 5 |
| 1 | 2016-05-02 | 11 |
| 1 | 2017-06-25 | 12 |
| 3 | 2016-03-02 | 0 |
| 3 | 2018-07-03 | 5 |
+-----------+------------+---------------------+First Thought: Group by Player
A first idea might be:
SELECT
player_id,
SUM(games_played)
FROM Activity
GROUP BY player_id;But this gives one row per player.
The problem needs one row per player and date.
For each date, we need a running total, not the final total.
So we need a cumulative sum ordered by date inside each player group.
Key Insight
This is a running total problem.
For each row, we want:
sum of games_played for the same player where event_date is less than or equal to the current event_dateSQL window functions express this directly.
We partition rows by player_id, sort each player’s rows by event_date, and compute a cumulative SUM.
Algorithm
Use:
SUM(games_played) OVER (...)with:
PARTITION BY player_idto reset the running total for each player.
Then use:
ORDER BY event_dateto accumulate games in chronological order.
The expression returns the total games played by that player from their first recorded date through the current row’s date.
SQL Solution
SELECT
player_id,
event_date,
SUM(games_played) OVER (
PARTITION BY player_id
ORDER BY event_date
) AS games_played_so_far
FROM Activity;SQL Explanation
This part chooses the output columns:
SELECT
player_id,
event_date,The window sum computes a running total:
SUM(games_played) OVER (
PARTITION BY player_id
ORDER BY event_date
)PARTITION BY player_id means each player is handled independently.
So player 1’s games do not mix with player 3’s games.
ORDER BY event_date means rows are processed from earliest date to latest date for each player.
For each row, the cumulative sum includes the current row and all earlier rows from the same player.
The alias names the computed column:
AS games_played_so_farCorrectness
For any output row with player p and date d, the window partition contains exactly the rows from Activity where player_id = p.
Inside that partition, rows are ordered by event_date.
The window SUM accumulates games_played from the first row in the partition through the current row. Since the current row has date d, this sum includes exactly the games played by player p on dates up to and including d.
That is exactly the required value of games_played_so_far.
Because the query returns one row for every row in Activity, it reports the cumulative total for each player and each date.
Complexity
Let n be the number of rows in Activity.
| Metric | Value | Why |
|---|---|---|
| Time | O(n log n) | The database typically sorts rows by player_id and event_date for the window |
| Space | O(n) | The database may materialize or sort the window input |
Actual execution depends on the database engine and indexes.
An index on:
(player_id, event_date)is ideal for this query because it matches the partition and ordering keys.
Alternative: Self Join
If window functions are unavailable, we can use a self join.
SELECT
a1.player_id,
a1.event_date,
SUM(a2.games_played) AS games_played_so_far
FROM Activity AS a1
JOIN Activity AS a2
ON a1.player_id = a2.player_id
AND a2.event_date <= a1.event_date
GROUP BY
a1.player_id,
a1.event_date;For each row a1, the join finds all earlier or same-date rows a2 for the same player.
Then:
SUM(a2.games_played)computes the running total up to a1.event_date.
The window function version is usually clearer and more efficient.
Testing
Given:
CREATE TABLE Activity (
player_id INT,
device_id INT,
event_date DATE,
games_played INT,
PRIMARY KEY (player_id, event_date)
);
INSERT INTO Activity (player_id, device_id, event_date, games_played) VALUES
(1, 2, '2016-03-01', 5),
(1, 2, '2016-05-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);Query:
SELECT
player_id,
event_date,
SUM(games_played) OVER (
PARTITION BY player_id
ORDER BY event_date
) AS games_played_so_far
FROM Activity
ORDER BY player_id, event_date;Expected result:
+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1 | 2016-03-01 | 5 |
| 1 | 2016-05-02 | 11 |
| 1 | 2017-06-25 | 12 |
| 3 | 2016-03-02 | 0 |
| 3 | 2018-07-03 | 5 |
+-----------+------------+---------------------+| Test | Why |
|---|---|
| Multiple rows for one player | Checks cumulative sum |
| Multiple players | Checks partition reset |
games_played = 0 | Confirms zero still contributes correctly |
| Ordered output | Makes manual verification easier |