A clear SQL guide for finding the question with the highest answer rate from survey logs.
Problem Restatement
We are given a table called SurveyLog.
Each row records one user action on one question. The action can be:
'show'
'answer'
'skip'The answer rate of a question is:
number of answer actions / number of show actionsWe need to return the question_id with the highest answer rate. If multiple questions have the same highest answer rate, return the smallest question_id. The returned column should be named survey_log.
Table
SurveyLog
| Column | Type | Meaning |
|---|---|---|
id | int | User ID |
action | varchar | One of show, answer, or skip |
question_id | int | Question ID |
answer_id | int | Answer ID, not null only when action is answer |
q_num | int | Order of the question in the current session |
timestamp | int | Time of the action |
Example
Input:
| id | action | question_id | answer_id | q_num | timestamp |
|---|---|---|---|---|---|
| 5 | show | 285 | null | 1 | 123 |
| 5 | answer | 285 | 124124 | 1 | 124 |
| 5 | show | 369 | null | 2 | 125 |
| 5 | skip | 369 | null | 2 | 126 |
For question 285:
answer count = 1
show count = 1
answer rate = 1 / 1 = 1.0For question 369:
answer count = 0
show count = 1
answer rate = 0 / 1 = 0.0Output:
| survey_log |
|---|
| 285 |
First Thought: Count Answers Only
A first attempt might be to count only answer rows:
SELECT question_id
FROM SurveyLog
WHERE action = 'answer'
GROUP BY question_id
ORDER BY COUNT(*) DESC
LIMIT 1;This is not enough.
The problem asks for answer rate, not answer count.
A question shown 100 times and answered 50 times has rate 0.5.
A question shown 2 times and answered 2 times has rate 1.0.
The second question should rank higher even though it has fewer total answers.
Key Insight
For each question, we need two counts:
answer_count = number of rows where action = 'answer'
show_count = number of rows where action = 'show'Then compute:
answer_rate = answer_count / show_countIn SQL, we can compute conditional counts using SUM(CASE WHEN ... THEN 1 ELSE 0 END).
Algorithm
- Group rows by
question_id. - Count how many times each question was answered.
- Count how many times each question was shown.
- Sort by answer rate from highest to lowest.
- If rates tie, sort by
question_idfrom smallest to largest. - Return the first row.
Correctness
The query groups all rows with the same question_id, so every aggregate is computed per question.
For each group, the answer count includes exactly the rows whose action is answer. The show count includes exactly the rows whose action is show. Therefore, their ratio is exactly the answer rate defined by the problem.
Sorting by this ratio in descending order places the question with the highest answer rate first.
If multiple questions have the same answer rate, sorting by question_id in ascending order places the smallest question_id first.
Therefore, selecting the first row returns exactly the required question.
Complexity
Let:
N = number of rows in SurveyLog
Q = number of distinct questions| Metric | Value | Why |
|---|---|---|
| Time | O(N + Q log Q) | Scan rows, group by question, then sort grouped results |
| Space | O(Q) | Store one aggregate row per question |
The exact execution plan depends on the database engine.
Implementation
SELECT
question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END)
/ SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
question_id ASC
LIMIT 1;Code Explanation
This groups all actions for the same question:
GROUP BY question_idThis counts answer actions:
SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END)This counts show actions:
SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END)The ratio is used for sorting:
ORDER BY
answer_count / show_count DESCThe query writes the expression directly inside ORDER BY.
The tie-breaker is:
question_id ASCSo if two questions have the same answer rate, the smaller question ID is returned.
The output column is renamed here:
question_id AS survey_logMySQL Integer Division Note
In MySQL, division with / returns a decimal result, so the implementation above works.
A more explicit version is:
SELECT
question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) * 1.0
/ SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
question_id ASC
LIMIT 1;Multiplying by 1.0 makes the ratio clearly numeric.
Testing
Sample data:
CREATE TABLE SurveyLog (
id INT,
action VARCHAR(20),
question_id INT,
answer_id INT,
q_num INT,
timestamp INT
);
INSERT INTO SurveyLog (id, action, question_id, answer_id, q_num, timestamp) VALUES
(5, 'show', 285, NULL, 1, 123),
(5, 'answer', 285, 124124, 1, 124),
(5, 'show', 369, NULL, 2, 125),
(5, 'skip', 369, NULL, 2, 126);Query:
SELECT
question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) * 1.0
/ SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
question_id ASC
LIMIT 1;Expected result:
| survey_log |
|---|
| 285 |
Additional test cases:
| Case | Expected behavior |
|---|---|
One question has rate 1.0 | That question should rank first |
| One question has many answers but lower rate | It should not win by count alone |
| Two questions have the same rate | Smaller question_id should win |
| Question has shows but no answers | Its answer rate is 0 |
| Several users see the same question | All rows for that question are grouped together |