Skip to content

LeetCode 578: Get Highest Answer Rate Question

A clear SQL guide for finding the question with the highest answer rate from survey logs.

Problem Restatement

We are given a table called SurveyLog.

Each row records one user action on one question. The action can be:

'show'
'answer'
'skip'

The answer rate of a question is:

number of answer actions / number of show actions

We need to return the question_id with the highest answer rate. If multiple questions have the same highest answer rate, return the smallest question_id. The returned column should be named survey_log.

Table

SurveyLog

ColumnTypeMeaning
idintUser ID
actionvarcharOne of show, answer, or skip
question_idintQuestion ID
answer_idintAnswer ID, not null only when action is answer
q_numintOrder of the question in the current session
timestampintTime of the action

Example

Input:

idactionquestion_idanswer_idq_numtimestamp
5show285null1123
5answer2851241241124
5show369null2125
5skip369null2126

For question 285:

answer count = 1
show count = 1
answer rate = 1 / 1 = 1.0

For question 369:

answer count = 0
show count = 1
answer rate = 0 / 1 = 0.0

Output:

survey_log
285

First Thought: Count Answers Only

A first attempt might be to count only answer rows:

SELECT question_id
FROM SurveyLog
WHERE action = 'answer'
GROUP BY question_id
ORDER BY COUNT(*) DESC
LIMIT 1;

This is not enough.

The problem asks for answer rate, not answer count.

A question shown 100 times and answered 50 times has rate 0.5.

A question shown 2 times and answered 2 times has rate 1.0.

The second question should rank higher even though it has fewer total answers.

Key Insight

For each question, we need two counts:

answer_count = number of rows where action = 'answer'
show_count = number of rows where action = 'show'

Then compute:

answer_rate = answer_count / show_count

In SQL, we can compute conditional counts using SUM(CASE WHEN ... THEN 1 ELSE 0 END).

Algorithm

  1. Group rows by question_id.
  2. Count how many times each question was answered.
  3. Count how many times each question was shown.
  4. Sort by answer rate from highest to lowest.
  5. If rates tie, sort by question_id from smallest to largest.
  6. Return the first row.

Correctness

The query groups all rows with the same question_id, so every aggregate is computed per question.

For each group, the answer count includes exactly the rows whose action is answer. The show count includes exactly the rows whose action is show. Therefore, their ratio is exactly the answer rate defined by the problem.

Sorting by this ratio in descending order places the question with the highest answer rate first.

If multiple questions have the same answer rate, sorting by question_id in ascending order places the smallest question_id first.

Therefore, selecting the first row returns exactly the required question.

Complexity

Let:

N = number of rows in SurveyLog
Q = number of distinct questions
MetricValueWhy
TimeO(N + Q log Q)Scan rows, group by question, then sort grouped results
SpaceO(Q)Store one aggregate row per question

The exact execution plan depends on the database engine.

Implementation

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END)
    / SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
    question_id ASC
LIMIT 1;

Code Explanation

This groups all actions for the same question:

GROUP BY question_id

This counts answer actions:

SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END)

This counts show actions:

SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END)

The ratio is used for sorting:

ORDER BY
    answer_count / show_count DESC

The query writes the expression directly inside ORDER BY.

The tie-breaker is:

question_id ASC

So if two questions have the same answer rate, the smaller question ID is returned.

The output column is renamed here:

question_id AS survey_log

MySQL Integer Division Note

In MySQL, division with / returns a decimal result, so the implementation above works.

A more explicit version is:

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) * 1.0
    / SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
    question_id ASC
LIMIT 1;

Multiplying by 1.0 makes the ratio clearly numeric.

Testing

Sample data:

CREATE TABLE SurveyLog (
    id INT,
    action VARCHAR(20),
    question_id INT,
    answer_id INT,
    q_num INT,
    timestamp INT
);

INSERT INTO SurveyLog (id, action, question_id, answer_id, q_num, timestamp) VALUES
(5, 'show', 285, NULL, 1, 123),
(5, 'answer', 285, 124124, 1, 124),
(5, 'show', 369, NULL, 2, 125),
(5, 'skip', 369, NULL, 2, 126);

Query:

SELECT
    question_id AS survey_log
FROM SurveyLog
GROUP BY question_id
ORDER BY
    SUM(CASE WHEN action = 'answer' THEN 1 ELSE 0 END) * 1.0
    / SUM(CASE WHEN action = 'show' THEN 1 ELSE 0 END) DESC,
    question_id ASC
LIMIT 1;

Expected result:

survey_log
285

Additional test cases:

CaseExpected behavior
One question has rate 1.0That question should rank first
One question has many answers but lower rateIt should not win by count alone
Two questions have the same rateSmaller question_id should win
Question has shows but no answersIts answer rate is 0
Several users see the same questionAll rows for that question are grouped together