Find the k most frequent words using frequency counting and custom sorting by count and lexicographical order.
Problem Restatement
We are given a list of words and an integer k.
We need to return the k most frequent words.
The ordering rules are:
| Rule | Meaning |
|---|---|
| Higher frequency first | More common words come earlier |
| Lexicographical tie-break | If two words have the same frequency, the smaller alphabetical word comes first |
The official statement defines lexicographical order using normal alphabetical string comparison. (leetcode.com)
Input and Output
| Item | Meaning |
|---|---|
| Input | words, a list of strings, and integer k |
| Output | List of the k most frequent words |
| Primary ordering | Frequency descending |
| Secondary ordering | Lexicographical ascending |
Example function shape:
def topKFrequent(words: list[str], k: int) -> list[str]:
...Examples
Example 1:
words = ["i", "love", "leetcode", "i", "love", "coding"]
k = 2Frequencies:
| Word | Count |
|---|---|
"i" | 2 |
"love" | 2 |
"leetcode" | 1 |
"coding" | 1 |
The two most frequent words are "i" and "love".
Both have the same count, so lexicographical order decides:
"i" < "love"Answer:
["i", "love"]Example 2:
words = [
"the", "day", "is", "sunny",
"the", "the", "the",
"sunny", "sunny",
"is", "is"
]
k = 4Frequencies:
| Word | Count |
|---|---|
"the" | 4 |
"is" | 3 |
"sunny" | 2 |
"day" | 1 |
Answer:
["the", "is", "sunny", "day"]First Thought: Count and Sort
The problem naturally splits into two parts:
- Count word frequencies.
- Sort words using the required ordering rules.
A hash map is ideal for counting.
Then we sort by:
| Priority | Direction |
|---|---|
| Frequency | Descending |
| Word | Ascending |
Key Insight
Python sorting supports tuple keys.
Suppose:
count[word]stores the frequency.
Then the sorting key can be:
(-count[word], word)Why negative frequency?
Because Python sorts ascending by default.
Using:
-count[word]makes larger counts come first.
The second tuple element:
wordautomatically handles lexicographical tie-breaking.
Algorithm
- Count frequencies with a hash map.
- Sort unique words using:
key=lambda word: (-count[word], word) - Return the first
kwords.
Walkthrough
Consider:
words = ["i", "love", "leetcode", "i", "love", "coding"]
k = 2Frequency table:
| Word | Count |
|---|---|
"i" | 2 |
"love" | 2 |
"leetcode" | 1 |
"coding" | 1 |
Sorting key values:
| Word | Key |
|---|---|
"i" | (-2, "i") |
"love" | (-2, "love") |
"leetcode" | (-1, "leetcode") |
"coding" | (-1, "coding") |
Sorted order:
["i", "love", "coding", "leetcode"]Take the first 2 words:
["i", "love"]Why "coding" Comes Before "leetcode"
Both have frequency 1.
So the second ordering rule applies:
"coding" < "leetcode"Alphabetically, "coding" comes first.
Correctness
The frequency map correctly stores how many times each word appears in the input.
The sorting rule uses the tuple:
(-frequency, word)Python tuple comparison compares elements from left to right.
Therefore:
- Words with larger frequencies appear earlier because larger frequencies correspond to smaller negative values.
- If two words have equal frequencies, the second tuple element compares the words lexicographically in ascending order.
Thus, the sorted list exactly matches the required ordering from the problem statement.
Returning the first k words therefore returns the k most frequent words in the correct order.
Complexity
Let:
| Symbol | Meaning |
|---|---|
n | Total number of input words |
u | Number of unique words |
| Metric | Value | Why |
|---|---|---|
| Time | O(n + u log u) | Count frequencies, then sort unique words |
| Space | O(u) | Frequency map and sorted unique list |
Implementation
from collections import Counter
class Solution:
def topKFrequent(self, words: list[str], k: int) -> list[str]:
count = Counter(words)
ordered = sorted(
count.keys(),
key=lambda word: (-count[word], word),
)
return ordered[:k]Code Explanation
We count frequencies using Counter:
count = Counter(words)For example:
Counter({
"i": 2,
"love": 2,
"leetcode": 1,
"coding": 1,
})Then we sort the unique words:
ordered = sorted(
count.keys(),
key=lambda word: (-count[word], word),
)The key:
(-count[word], word)means:
| Part | Purpose |
|---|---|
-count[word] | Higher frequency first |
word | Alphabetical tie-break |
Finally:
return ordered[:k]returns the top k words.
Heap Alternative
The sorting solution is simple and efficient.
A heap solution is useful when k is much smaller than the number of unique words.
Example heap approach:
import heapq
from collections import Counter
class Solution:
def topKFrequent(self, words: list[str], k: int) -> list[str]:
count = Counter(words)
heap = [(-freq, word) for word, freq in count.items()]
heapq.heapify(heap)
answer = []
for _ in range(k):
_, word = heapq.heappop(heap)
answer.append(word)
return answerThis uses the same ordering idea:
(-freq, word)because Python heaps are min-heaps.
Testing
def run_tests():
s = Solution()
assert s.topKFrequent(
["i", "love", "leetcode", "i", "love", "coding"],
2,
) == ["i", "love"]
assert s.topKFrequent(
[
"the", "day", "is", "sunny",
"the", "the", "the",
"sunny", "sunny",
"is", "is"
],
4,
) == ["the", "is", "sunny", "day"]
assert s.topKFrequent(
["a", "b", "c"],
2,
) == ["a", "b"]
assert s.topKFrequent(
["a", "aa", "aaa"],
3,
) == ["a", "aa", "aaa"]
assert s.topKFrequent(
["x", "x", "y", "y", "z"],
2,
) == ["x", "y"]
print("all tests passed")
run_tests()Test meaning:
| Test | Expected | Why |
|---|---|---|
["i","love","leetcode","i","love","coding"], k=2 | ["i","love"] | Official example |
| Sunny/day example | ["the","is","sunny","day"] | Official example |
| Equal frequencies | ["a","b"] | Lexicographical ordering |
| Increasing string lengths | ["a","aa","aaa"] | Alphabetical ordering check |
| Mixed frequencies | ["x","y"] | Frequency descending |