Skip to content

LeetCode 692: Top K Frequent Words

Find the k most frequent words using frequency counting and custom sorting by count and lexicographical order.

Problem Restatement

We are given a list of words and an integer k.

We need to return the k most frequent words.

The ordering rules are:

RuleMeaning
Higher frequency firstMore common words come earlier
Lexicographical tie-breakIf two words have the same frequency, the smaller alphabetical word comes first

The official statement defines lexicographical order using normal alphabetical string comparison. (leetcode.com)

Input and Output

ItemMeaning
Inputwords, a list of strings, and integer k
OutputList of the k most frequent words
Primary orderingFrequency descending
Secondary orderingLexicographical ascending

Example function shape:

def topKFrequent(words: list[str], k: int) -> list[str]:
    ...

Examples

Example 1:

words = ["i", "love", "leetcode", "i", "love", "coding"]
k = 2

Frequencies:

WordCount
"i"2
"love"2
"leetcode"1
"coding"1

The two most frequent words are "i" and "love".

Both have the same count, so lexicographical order decides:

"i" < "love"

Answer:

["i", "love"]

Example 2:

words = [
    "the", "day", "is", "sunny",
    "the", "the", "the",
    "sunny", "sunny",
    "is", "is"
]
k = 4

Frequencies:

WordCount
"the"4
"is"3
"sunny"2
"day"1

Answer:

["the", "is", "sunny", "day"]

First Thought: Count and Sort

The problem naturally splits into two parts:

  1. Count word frequencies.
  2. Sort words using the required ordering rules.

A hash map is ideal for counting.

Then we sort by:

PriorityDirection
FrequencyDescending
WordAscending

Key Insight

Python sorting supports tuple keys.

Suppose:

count[word]

stores the frequency.

Then the sorting key can be:

(-count[word], word)

Why negative frequency?

Because Python sorts ascending by default.

Using:

-count[word]

makes larger counts come first.

The second tuple element:

word

automatically handles lexicographical tie-breaking.

Algorithm

  1. Count frequencies with a hash map.
  2. Sort unique words using:
    key=lambda word: (-count[word], word)
  3. Return the first k words.

Walkthrough

Consider:

words = ["i", "love", "leetcode", "i", "love", "coding"]
k = 2

Frequency table:

WordCount
"i"2
"love"2
"leetcode"1
"coding"1

Sorting key values:

WordKey
"i"(-2, "i")
"love"(-2, "love")
"leetcode"(-1, "leetcode")
"coding"(-1, "coding")

Sorted order:

["i", "love", "coding", "leetcode"]

Take the first 2 words:

["i", "love"]

Why "coding" Comes Before "leetcode"

Both have frequency 1.

So the second ordering rule applies:

"coding" < "leetcode"

Alphabetically, "coding" comes first.

Correctness

The frequency map correctly stores how many times each word appears in the input.

The sorting rule uses the tuple:

(-frequency, word)

Python tuple comparison compares elements from left to right.

Therefore:

  1. Words with larger frequencies appear earlier because larger frequencies correspond to smaller negative values.
  2. If two words have equal frequencies, the second tuple element compares the words lexicographically in ascending order.

Thus, the sorted list exactly matches the required ordering from the problem statement.

Returning the first k words therefore returns the k most frequent words in the correct order.

Complexity

Let:

SymbolMeaning
nTotal number of input words
uNumber of unique words
MetricValueWhy
TimeO(n + u log u)Count frequencies, then sort unique words
SpaceO(u)Frequency map and sorted unique list

Implementation

from collections import Counter

class Solution:
    def topKFrequent(self, words: list[str], k: int) -> list[str]:
        count = Counter(words)

        ordered = sorted(
            count.keys(),
            key=lambda word: (-count[word], word),
        )

        return ordered[:k]

Code Explanation

We count frequencies using Counter:

count = Counter(words)

For example:

Counter({
    "i": 2,
    "love": 2,
    "leetcode": 1,
    "coding": 1,
})

Then we sort the unique words:

ordered = sorted(
    count.keys(),
    key=lambda word: (-count[word], word),
)

The key:

(-count[word], word)

means:

PartPurpose
-count[word]Higher frequency first
wordAlphabetical tie-break

Finally:

return ordered[:k]

returns the top k words.

Heap Alternative

The sorting solution is simple and efficient.

A heap solution is useful when k is much smaller than the number of unique words.

Example heap approach:

import heapq
from collections import Counter

class Solution:
    def topKFrequent(self, words: list[str], k: int) -> list[str]:
        count = Counter(words)

        heap = [(-freq, word) for word, freq in count.items()]
        heapq.heapify(heap)

        answer = []

        for _ in range(k):
            _, word = heapq.heappop(heap)
            answer.append(word)

        return answer

This uses the same ordering idea:

(-freq, word)

because Python heaps are min-heaps.

Testing

def run_tests():
    s = Solution()

    assert s.topKFrequent(
        ["i", "love", "leetcode", "i", "love", "coding"],
        2,
    ) == ["i", "love"]

    assert s.topKFrequent(
        [
            "the", "day", "is", "sunny",
            "the", "the", "the",
            "sunny", "sunny",
            "is", "is"
        ],
        4,
    ) == ["the", "is", "sunny", "day"]

    assert s.topKFrequent(
        ["a", "b", "c"],
        2,
    ) == ["a", "b"]

    assert s.topKFrequent(
        ["a", "aa", "aaa"],
        3,
    ) == ["a", "aa", "aaa"]

    assert s.topKFrequent(
        ["x", "x", "y", "y", "z"],
        2,
    ) == ["x", "y"]

    print("all tests passed")

run_tests()

Test meaning:

TestExpectedWhy
["i","love","leetcode","i","love","coding"], k=2["i","love"]Official example
Sunny/day example["the","is","sunny","day"]Official example
Equal frequencies["a","b"]Lexicographical ordering
Increasing string lengths["a","aa","aaa"]Alphabetical ordering check
Mixed frequencies["x","y"]Frequency descending