# LeetCode 347: Top K Frequent Elements

## Problem Restatement

We are given an integer array `nums` and an integer `k`.

Return the `k` most frequent elements in the array.

The answer may be returned in any order. The problem guarantees that the answer is unique. The constraints allow `nums.length` up to `10^5`, so the solution must be better than `O(n^2)`.

## Input and Output

| Item | Meaning |
|---|---|
| Input | Integer array `nums` and integer `k` |
| Output | The `k` most frequent elements |
| Order | Any order is accepted |
| Guarantee | The answer is unique |
| Constraint | `1 <= k <= number of unique elements` |

Function shape:

```python
def topKFrequent(nums: list[int], k: int) -> list[int]:
    ...
```

## Examples

Example 1:

```text
Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]
```

Frequencies:

| Number | Count |
|---:|---:|
| `1` | `3` |
| `2` | `2` |
| `3` | `1` |

The two most frequent elements are `1` and `2`.

Example 2:

```text
Input: nums = [1], k = 1
Output: [1]
```

There is only one unique element, so it must be returned.

Example 3:

```text
Input: nums = [1,2,1,2,1,2,3,1,3,2], k = 2
Output: [1,2]
```

Frequencies:

| Number | Count |
|---:|---:|
| `1` | `4` |
| `2` | `4` |
| `3` | `2` |

The top two elements are `1` and `2`.

## First Thought: Count and Sort

The most direct approach is:

1. Count how many times each number appears.
2. Sort the numbers by frequency.
3. Take the first `k`.

```python
from collections import Counter

class Solution:
    def topKFrequent(self, nums: list[int], k: int) -> list[int]:
        count = Counter(nums)

        items = sorted(
            count.items(),
            key=lambda item: item[1],
            reverse=True,
        )

        return [num for num, freq in items[:k]]
```

This is correct and simple.

But sorting all unique numbers costs:

```text
O(m log m)
```

where `m` is the number of unique elements.

We can do better with bucket sort.

## Key Insight

A frequency cannot be larger than `len(nums)`.

So instead of sorting by frequency, create buckets.

Each bucket index represents a frequency.

```text
bucket[f] = all numbers that appear exactly f times
```

For example:

```text
nums = [1,1,1,2,2,3]
```

Frequencies:

```text
1 -> 3
2 -> 2
3 -> 1
```

Buckets:

```text
bucket[1] = [3]
bucket[2] = [2]
bucket[3] = [1]
```

Then scan buckets from high frequency to low frequency.

Collect numbers until we have `k` elements.

## Algorithm

1. Count frequencies using a hash map.
2. Create `len(nums) + 1` empty buckets.
3. For each `(num, freq)`:
   1. Put `num` into `buckets[freq]`.
4. Scan `freq` from `len(nums)` down to `1`.
5. Add numbers from each bucket into the answer.
6. Stop once the answer has `k` elements.

## Walkthrough

Use:

```text
nums = [1,1,1,2,2,3]
k = 2
```

Count frequencies:

```text
1 -> 3
2 -> 2
3 -> 1
```

Create buckets:

```text
buckets[0] = []
buckets[1] = [3]
buckets[2] = [2]
buckets[3] = [1]
buckets[4] = []
buckets[5] = []
buckets[6] = []
```

Scan from high to low.

At frequency `3`, find:

```text
[1]
```

Answer becomes:

```text
[1]
```

At frequency `2`, find:

```text
[2]
```

Answer becomes:

```text
[1, 2]
```

Now we have `k = 2` elements, so return.

## Correctness

The frequency map stores the exact number of times each unique value appears in `nums`.

Each number is placed into the bucket matching its frequency. Therefore, `buckets[f]` contains exactly the numbers whose frequency is `f`.

The algorithm scans buckets from the largest possible frequency down to the smallest. This means every number collected earlier has frequency greater than or equal to every number collected later.

Since the answer is unique and we stop after collecting `k` numbers, the returned list contains exactly the `k` most frequent elements.

The output order does not matter, so the order in which elements are collected from the buckets is acceptable.

## Complexity

Let `n = len(nums)`.

| Metric | Value | Why |
|---|---|---|
| Time | `O(n)` | Count once, bucket once, scan buckets once |
| Space | `O(n)` | Frequency map and buckets |

The bucket array has length `n + 1`.

## Implementation

```python
from collections import Counter

class Solution:
    def topKFrequent(self, nums: list[int], k: int) -> list[int]:
        count = Counter(nums)

        buckets = [[] for _ in range(len(nums) + 1)]

        for num, freq in count.items():
            buckets[freq].append(num)

        ans = []

        for freq in range(len(buckets) - 1, 0, -1):
            for num in buckets[freq]:
                ans.append(num)

                if len(ans) == k:
                    return ans

        return ans
```

## Code Explanation

Count each number:

```python
count = Counter(nums)
```

Create buckets by possible frequency:

```python
buckets = [[] for _ in range(len(nums) + 1)]
```

Place each number into its frequency bucket:

```python
for num, freq in count.items():
    buckets[freq].append(num)
```

Scan from highest frequency to lowest:

```python
for freq in range(len(buckets) - 1, 0, -1):
```

Append numbers until we have `k` elements:

```python
ans.append(num)

if len(ans) == k:
    return ans
```

The final `return ans` is only a safety return. The problem guarantees that `k` is valid.

## Heap Alternative

A heap solution is also common.

It keeps only `k` elements in a min-heap. This is useful when the number of unique elements is large and `k` is small.

Its complexity is:

| Metric | Value |
|---|---|
| Time | `O(n log k)` |
| Space | `O(n + k)` |

Bucket sort is simpler here because frequencies are bounded by `n`.

## Testing

Because the answer can be returned in any order, tests should compare sets.

```python
def run_tests():
    s = Solution()

    assert set(s.topKFrequent([1, 1, 1, 2, 2, 3], 2)) == {1, 2}
    assert set(s.topKFrequent([1], 1)) == {1}
    assert set(s.topKFrequent([1, 2, 1, 2, 1, 2, 3, 1, 3, 2], 2)) == {1, 2}

    assert set(s.topKFrequent([4, 4, 4, 6, 6, 7], 1)) == {4}
    assert set(s.topKFrequent([-1, -1, -2, -2, -2, 3], 2)) == {-2, -1}
    assert set(s.topKFrequent([5, 3, 1, 1, 1, 3, 73, 1], 2)) == {1, 3}

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| `[1,1,1,2,2,3]`, `2` | Standard sample |
| `[1]`, `1` | Single element |
| Equal top frequencies | Unique top-k set still exists |
| `k = 1` | Only most frequent element |
| Negative numbers | Hash map handles signed values |
| Mixed values | Checks general counting behavior |