A clear explanation of the Trips and Users SQL problem using joins, filtering, grouping, and conditional aggregation.
Problem Restatement
We are given two tables: Trips and Users.
Trips stores taxi trip requests.
Users stores user information, including whether a user is banned.
We need to find the cancellation rate for each day from:
'2013-10-01'to:
'2013-10-03'Only trips where both the client and driver are unbanned should be counted.
The cancellation rate is:
cancelled requests / total requestsCancelled means the trip status is either:
'cancelled_by_driver'or:
'cancelled_by_client'The result should show each day and its cancellation rate rounded to two decimal places. The problem states both client and driver must be unbanned, and the date range is from 2013-10-01 to 2013-10-03.
Tables
Trips
| Column | Meaning |
|---|---|
id | Trip id |
client_id | User id of the client |
driver_id | User id of the driver |
city_id | City id |
status | Trip status |
request_at | Request date |
Possible status values include:
'completed'
'cancelled_by_driver'
'cancelled_by_client'Users
| Column | Meaning |
|---|---|
users_id | User id |
banned | Whether this user is banned |
role | User role |
Possible banned values are:
'Yes'
'No'Output
The result table should contain:
| Column | Meaning |
|---|---|
Day | Request date |
Cancellation Rate | Cancelled requests divided by total requests, rounded to 2 decimals |
Example output shape:
| Day | Cancellation Rate |
|---|---|
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
First Thought: Count All Trips Per Day
A first attempt might group trips by day:
SELECT
request_at AS Day,
COUNT(*) AS total_requests
FROM Trips
GROUP BY request_at;This gives total requests per day.
But it is incomplete because the problem says banned users must be excluded.
A trip should be counted only when both users are unbanned:
- The client is not banned.
- The driver is not banned.
So we need to join Users twice.
Key Insight
The Trips table has two user references:
| Trips column | Meaning | Join target |
|---|---|---|
client_id | Client user id | Users.users_id |
driver_id | Driver user id | Users.users_id |
Because both client and driver come from the same Users table, we join Users two times with different aliases.
JOIN Users AS c
ON t.client_id = c.users_id
JOIN Users AS d
ON t.driver_id = d.users_idThen we filter:
c.banned = 'No'
AND d.banned = 'No'Now each remaining row is a valid request with unbanned users.
Counting Cancelled Requests
After filtering valid trips, cancellation rate per day is:
cancelled_count / total_countWe can count cancelled trips using conditional aggregation.
In MySQL, this works:
SUM(t.status != 'completed')Because t.status != 'completed' evaluates to 1 for cancelled trips and 0 for completed trips.
A more portable version uses CASE:
SUM(
CASE
WHEN t.status IN ('cancelled_by_driver', 'cancelled_by_client') THEN 1
ELSE 0
END
)Then divide by:
COUNT(*)Finally round to two decimal places:
ROUND(cancelled_count / total_count, 2)Algorithm
- Start from
Trips. - Join
Usersas clients. - Join
Usersas drivers. - Keep only rows where both users are unbanned.
- Keep only rows from
2013-10-01through2013-10-03. - Group by
request_at. - Compute:
- cancelled trip count
- total trip count
- cancellation rate
- Round the rate to two decimals.
Correctness
After joining Trips with Users as c, each row contains the client information for that trip.
After joining Trips with Users as d, each row also contains the driver information for that trip.
The filter:
c.banned = 'No'
AND d.banned = 'No'keeps exactly the trips where both users are unbanned.
The date filter keeps exactly the requested days.
For each remaining day group, COUNT(*) gives the total number of valid requests on that day.
The conditional sum counts exactly the valid requests whose status is cancelled by the client or cancelled by the driver.
Therefore the division computes the required cancellation rate for that day.
Rounding to two decimals gives the required output format.
Complexity
Let T be the number of rows in Trips, and U be the number of rows in Users.
| Metric | Value | Why |
|---|---|---|
| Time | Depends on indexes | The database joins Trips to Users twice, filters dates, then groups by day |
| Space | Depends on execution plan | The database may allocate memory for joins and aggregation |
Useful indexes would include:
Users(users_id)
Trips(client_id)
Trips(driver_id)
Trips(request_at)Users(users_id) is usually already indexed because it is the primary key.
Implementation
SELECT
t.request_at AS Day,
ROUND(
SUM(
CASE
WHEN t.status IN ('cancelled_by_driver', 'cancelled_by_client') THEN 1
ELSE 0
END
) / COUNT(*),
2
) AS `Cancellation Rate`
FROM Trips AS t
JOIN Users AS c
ON t.client_id = c.users_id
JOIN Users AS d
ON t.driver_id = d.users_id
WHERE c.banned = 'No'
AND d.banned = 'No'
AND t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY t.request_at;MySQL Short Version
SELECT
t.request_at AS Day,
ROUND(SUM(t.status != 'completed') / COUNT(*), 2) AS `Cancellation Rate`
FROM Trips AS t
JOIN Users AS c
ON t.client_id = c.users_id
JOIN Users AS d
ON t.driver_id = d.users_id
WHERE c.banned = 'No'
AND d.banned = 'No'
AND t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY t.request_at;This works in MySQL because boolean expressions can be summed as 1 or 0.
Code Explanation
This part selects the request date:
t.request_at AS DayThe first join gets the client row:
JOIN Users AS c
ON t.client_id = c.users_idThe second join gets the driver row:
JOIN Users AS d
ON t.driver_id = d.users_idThe banned-user filter is:
WHERE c.banned = 'No'
AND d.banned = 'No'The date filter is:
t.request_at BETWEEN '2013-10-01' AND '2013-10-03'The conditional sum counts cancellations:
SUM(
CASE
WHEN t.status IN ('cancelled_by_driver', 'cancelled_by_client') THEN 1
ELSE 0
END
)The denominator counts all valid requests for that day:
COUNT(*)The final value is rounded:
ROUND(..., 2)Testing
Sample data:
CREATE TABLE Trips (
id INT PRIMARY KEY,
client_id INT,
driver_id INT,
city_id INT,
status VARCHAR(30),
request_at DATE
);
CREATE TABLE Users (
users_id INT PRIMARY KEY,
banned VARCHAR(3),
role VARCHAR(20)
);
INSERT INTO Trips VALUES
(1, 1, 10, 1, 'completed', '2013-10-01'),
(2, 2, 11, 1, 'cancelled_by_driver', '2013-10-01'),
(3, 3, 12, 6, 'completed', '2013-10-01'),
(4, 4, 13, 6, 'cancelled_by_client', '2013-10-01'),
(5, 1, 10, 1, 'completed', '2013-10-02'),
(6, 2, 11, 6, 'completed', '2013-10-02'),
(7, 3, 12, 6, 'completed', '2013-10-02'),
(8, 2, 12, 12, 'completed', '2013-10-03'),
(9, 3, 10, 12, 'completed', '2013-10-03'),
(10, 4, 13, 12, 'cancelled_by_driver', '2013-10-03');
INSERT INTO Users VALUES
(1, 'No', 'client'),
(2, 'Yes', 'client'),
(3, 'No', 'client'),
(4, 'No', 'client'),
(10, 'No', 'driver'),
(11, 'No', 'driver'),
(12, 'No', 'driver'),
(13, 'No', 'driver');Expected result:
| Day | Cancellation Rate |
|---|---|
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
Test meaning:
| Test | Why |
|---|---|
| Banned client trip | Must be excluded from numerator and denominator |
| Completed trip | Counts in denominator only |
| Cancelled trip | Counts in numerator and denominator |
| Multiple days | Confirms grouping by date |
| Rounding | Confirms two decimal places |