A clear explanation of Top K Frequent Elements using frequency counting and bucket sort.
Problem Restatement
We are given an integer array nums and an integer k.
Return the k most frequent elements in the array.
The answer may be returned in any order. The problem guarantees that the answer is unique. The constraints allow nums.length up to 10^5, so the solution must be better than O(n^2).
Input and Output
| Item | Meaning |
|---|---|
| Input | Integer array nums and integer k |
| Output | The k most frequent elements |
| Order | Any order is accepted |
| Guarantee | The answer is unique |
| Constraint | 1 <= k <= number of unique elements |
Function shape:
def topKFrequent(nums: list[int], k: int) -> list[int]:
...Examples
Example 1:
Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]Frequencies:
| Number | Count |
|---|---|
1 | 3 |
2 | 2 |
3 | 1 |
The two most frequent elements are 1 and 2.
Example 2:
Input: nums = [1], k = 1
Output: [1]There is only one unique element, so it must be returned.
Example 3:
Input: nums = [1,2,1,2,1,2,3,1,3,2], k = 2
Output: [1,2]Frequencies:
| Number | Count |
|---|---|
1 | 4 |
2 | 4 |
3 | 2 |
The top two elements are 1 and 2.
First Thought: Count and Sort
The most direct approach is:
- Count how many times each number appears.
- Sort the numbers by frequency.
- Take the first
k.
from collections import Counter
class Solution:
def topKFrequent(self, nums: list[int], k: int) -> list[int]:
count = Counter(nums)
items = sorted(
count.items(),
key=lambda item: item[1],
reverse=True,
)
return [num for num, freq in items[:k]]This is correct and simple.
But sorting all unique numbers costs:
O(m log m)where m is the number of unique elements.
We can do better with bucket sort.
Key Insight
A frequency cannot be larger than len(nums).
So instead of sorting by frequency, create buckets.
Each bucket index represents a frequency.
bucket[f] = all numbers that appear exactly f timesFor example:
nums = [1,1,1,2,2,3]Frequencies:
1 -> 3
2 -> 2
3 -> 1Buckets:
bucket[1] = [3]
bucket[2] = [2]
bucket[3] = [1]Then scan buckets from high frequency to low frequency.
Collect numbers until we have k elements.
Algorithm
- Count frequencies using a hash map.
- Create
len(nums) + 1empty buckets. - For each
(num, freq):- Put
numintobuckets[freq].
- Put
- Scan
freqfromlen(nums)down to1. - Add numbers from each bucket into the answer.
- Stop once the answer has
kelements.
Walkthrough
Use:
nums = [1,1,1,2,2,3]
k = 2Count frequencies:
1 -> 3
2 -> 2
3 -> 1Create buckets:
buckets[0] = []
buckets[1] = [3]
buckets[2] = [2]
buckets[3] = [1]
buckets[4] = []
buckets[5] = []
buckets[6] = []Scan from high to low.
At frequency 3, find:
[1]Answer becomes:
[1]At frequency 2, find:
[2]Answer becomes:
[1, 2]Now we have k = 2 elements, so return.
Correctness
The frequency map stores the exact number of times each unique value appears in nums.
Each number is placed into the bucket matching its frequency. Therefore, buckets[f] contains exactly the numbers whose frequency is f.
The algorithm scans buckets from the largest possible frequency down to the smallest. This means every number collected earlier has frequency greater than or equal to every number collected later.
Since the answer is unique and we stop after collecting k numbers, the returned list contains exactly the k most frequent elements.
The output order does not matter, so the order in which elements are collected from the buckets is acceptable.
Complexity
Let n = len(nums).
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Count once, bucket once, scan buckets once |
| Space | O(n) | Frequency map and buckets |
The bucket array has length n + 1.
Implementation
from collections import Counter
class Solution:
def topKFrequent(self, nums: list[int], k: int) -> list[int]:
count = Counter(nums)
buckets = [[] for _ in range(len(nums) + 1)]
for num, freq in count.items():
buckets[freq].append(num)
ans = []
for freq in range(len(buckets) - 1, 0, -1):
for num in buckets[freq]:
ans.append(num)
if len(ans) == k:
return ans
return ansCode Explanation
Count each number:
count = Counter(nums)Create buckets by possible frequency:
buckets = [[] for _ in range(len(nums) + 1)]Place each number into its frequency bucket:
for num, freq in count.items():
buckets[freq].append(num)Scan from highest frequency to lowest:
for freq in range(len(buckets) - 1, 0, -1):Append numbers until we have k elements:
ans.append(num)
if len(ans) == k:
return ansThe final return ans is only a safety return. The problem guarantees that k is valid.
Heap Alternative
A heap solution is also common.
It keeps only k elements in a min-heap. This is useful when the number of unique elements is large and k is small.
Its complexity is:
| Metric | Value |
|---|---|
| Time | O(n log k) |
| Space | O(n + k) |
Bucket sort is simpler here because frequencies are bounded by n.
Testing
Because the answer can be returned in any order, tests should compare sets.
def run_tests():
s = Solution()
assert set(s.topKFrequent([1, 1, 1, 2, 2, 3], 2)) == {1, 2}
assert set(s.topKFrequent([1], 1)) == {1}
assert set(s.topKFrequent([1, 2, 1, 2, 1, 2, 3, 1, 3, 2], 2)) == {1, 2}
assert set(s.topKFrequent([4, 4, 4, 6, 6, 7], 1)) == {4}
assert set(s.topKFrequent([-1, -1, -2, -2, -2, 3], 2)) == {-2, -1}
assert set(s.topKFrequent([5, 3, 1, 1, 1, 3, 73, 1], 2)) == {1, 3}
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
[1,1,1,2,2,3], 2 | Standard sample |
[1], 1 | Single element |
| Equal top frequencies | Unique top-k set still exists |
k = 1 | Only most frequent element |
| Negative numbers | Hash map handles signed values |
| Mixed values | Checks general counting behavior |