A clear explanation of calculating the fraction of players who logged in again the day after their first login.
Problem Restatement
We are given a table named Activity.
Each row records one player’s login activity on one date.
We need to report the fraction of players who logged in again on the day immediately after their first login date.
The final answer must be rounded to 2 decimal places.
The result has one column:
fractionThe task is to count players with day-two retention, then divide by the total number of distinct players. The official statement asks for the fraction of players who logged in again the day after their first login date, rounded to 2 decimal places.
Table Schema
| Column | Type | Meaning |
|---|---|---|
player_id | int | The player |
device_id | int | The device used |
event_date | date | The login date |
games_played | int | Games played before logout |
The primary key is:
(player_id, event_date)So a player has at most one row per date.
Examples
Input:
Activity
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+First login date per player:
| Player | First Login | Next Day Exists? |
|---|---|---|
1 | 2016-03-01 | Yes, 2016-03-02 |
2 | 2017-06-25 | No |
3 | 2016-03-02 | No |
There are 3 total players.
Only 1 player logged in again on the next day.
So the fraction is:
1 / 3 = 0.33Output:
+----------+
| fraction |
+----------+
| 0.33 |
+----------+First Thought: Count All Consecutive Logins
A first attempt might look for any two consecutive login dates for a player.
But the problem is more specific.
We only care about the day after the player’s first login.
For example, if a player first logged in on 2016-03-01, skipped 2016-03-02, but logged in on 2016-03-03, that player does not count.
So we must first identify each player’s first login date.
Key Insight
The query has two parts.
First, find each player’s first login date:
SELECT
player_id,
MIN(event_date) AS first_login
FROM Activity
GROUP BY player_idSecond, check whether the same player has an activity row on:
first_login + INTERVAL 1 DAYIf such a row exists, the player counts in the numerator.
The denominator is the number of total players.
Algorithm
- Build a derived table
first_logins. - Join
first_loginsback toActivity. - The join condition checks:
- same
player_id Activity.event_date = first_login + 1 day
- same
- Count how many players matched.
- Divide by the total number of players.
- Round to
2decimal places.
SQL Solution
SELECT
ROUND(COUNT(a.player_id) / COUNT(f.player_id), 2) AS fraction
FROM (
SELECT
player_id,
MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id
) AS f
LEFT JOIN Activity AS a
ON f.player_id = a.player_id
AND a.event_date = DATE_ADD(f.first_login, INTERVAL 1 DAY);SQL Explanation
The subquery finds one row per player:
SELECT
player_id,
MIN(event_date) AS first_login
FROM Activity
GROUP BY player_idFor the sample, it produces:
| player_id | first_login |
|---|---|
1 | 2016-03-01 |
2 | 2017-06-25 |
3 | 2016-03-02 |
Then we left join back to Activity:
LEFT JOIN Activity AS a
ON f.player_id = a.player_id
AND a.event_date = DATE_ADD(f.first_login, INTERVAL 1 DAY)This keeps every player from f.
If the player logged in the next day, a.player_id is not null.
If not, the joined columns from a are null.
Then:
COUNT(a.player_id)counts players who returned the next day.
And:
COUNT(f.player_id)counts all players.
Finally:
ROUND(..., 2)rounds the fraction to two decimal places.
Correctness
The derived table f contains exactly one row for each player, and its first_login value is the earliest login date for that player.
The left join matches a row from Activity only when the same player has an event exactly one day after first_login.
Because (player_id, event_date) is the primary key, each player can match at most one next-day row.
Therefore, COUNT(a.player_id) is exactly the number of players who logged in again on the day after their first login.
COUNT(f.player_id) is exactly the total number of players.
The query divides these two values and rounds the result to two decimal places, which is exactly the required output.
Complexity
Let n be the number of rows in Activity.
| Metric | Value | Why |
|---|---|---|
| Time | O(n log n) typical | Grouping and joining may require sorting or hashing |
| Space | O(p) | The derived table stores one row per player |
p is the number of distinct players.
With an index on:
(player_id, event_date)the join can be efficient because it looks up a specific player-date pair.
Alternative Solution with IN
SELECT
ROUND(
COUNT(*) / (SELECT COUNT(DISTINCT player_id) FROM Activity),
2
) AS fraction
FROM Activity
WHERE (player_id, event_date) IN (
SELECT
player_id,
DATE_ADD(MIN(event_date), INTERVAL 1 DAY)
FROM Activity
GROUP BY player_id
);The subquery creates the required second-login date for each player.
The outer query counts how many of those player-date pairs actually exist in Activity.
Testing
CREATE TABLE Activity (
player_id INT,
device_id INT,
event_date DATE,
games_played INT,
PRIMARY KEY (player_id, event_date)
);
INSERT INTO Activity (player_id, device_id, event_date, games_played) VALUES
(1, 2, '2016-03-01', 5),
(1, 2, '2016-03-02', 6),
(2, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);Query:
SELECT
ROUND(COUNT(a.player_id) / COUNT(f.player_id), 2) AS fraction
FROM (
SELECT
player_id,
MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id
) AS f
LEFT JOIN Activity AS a
ON f.player_id = a.player_id
AND a.event_date = DATE_ADD(f.first_login, INTERVAL 1 DAY);Expected output:
+----------+
| fraction |
+----------+
| 0.33 |
+----------+| Test | Why |
|---|---|
| Player returns next day | Counts in numerator |
| Player returns much later | Does not count |
| Player has only one login | Does not count |
| Multiple players | Confirms denominator uses total players |