# LeetCode 692: Top K Frequent Words

## Problem Restatement

We are given a list of words and an integer `k`.

We need to return the `k` most frequent words.

The ordering rules are:

| Rule | Meaning |
|---|---|
| Higher frequency first | More common words come earlier |
| Lexicographical tie-break | If two words have the same frequency, the smaller alphabetical word comes first |

The official statement defines lexicographical order using normal alphabetical string comparison. ([leetcode.com](https://leetcode.com/problems/top-k-frequent-words/?utm_source=chatgpt.com))

## Input and Output

| Item | Meaning |
|---|---|
| Input | `words`, a list of strings, and integer `k` |
| Output | List of the `k` most frequent words |
| Primary ordering | Frequency descending |
| Secondary ordering | Lexicographical ascending |

Example function shape:

```python
def topKFrequent(words: list[str], k: int) -> list[str]:
    ...
```

## Examples

Example 1:

```python
words = ["i", "love", "leetcode", "i", "love", "coding"]
k = 2
```

Frequencies:

| Word | Count |
|---|---:|
| `"i"` | `2` |
| `"love"` | `2` |
| `"leetcode"` | `1` |
| `"coding"` | `1` |

The two most frequent words are `"i"` and `"love"`.

Both have the same count, so lexicographical order decides:

```text
"i" < "love"
```

Answer:

```python
["i", "love"]
```

Example 2:

```python
words = [
    "the", "day", "is", "sunny",
    "the", "the", "the",
    "sunny", "sunny",
    "is", "is"
]
k = 4
```

Frequencies:

| Word | Count |
|---|---:|
| `"the"` | `4` |
| `"is"` | `3` |
| `"sunny"` | `2` |
| `"day"` | `1` |

Answer:

```python
["the", "is", "sunny", "day"]
```

## First Thought: Count and Sort

The problem naturally splits into two parts:

1. Count word frequencies.
2. Sort words using the required ordering rules.

A hash map is ideal for counting.

Then we sort by:

| Priority | Direction |
|---|---|
| Frequency | Descending |
| Word | Ascending |

## Key Insight

Python sorting supports tuple keys.

Suppose:

```python
count[word]
```

stores the frequency.

Then the sorting key can be:

```python
(-count[word], word)
```

Why negative frequency?

Because Python sorts ascending by default.

Using:

```python
-count[word]
```

makes larger counts come first.

The second tuple element:

```python
word
```

automatically handles lexicographical tie-breaking.

## Algorithm

1. Count frequencies with a hash map.
2. Sort unique words using:
   ```python id="ljlwmv"
   key=lambda word: (-count[word], word)
   ```
3. Return the first `k` words.

## Walkthrough

Consider:

```python
words = ["i", "love", "leetcode", "i", "love", "coding"]
k = 2
```

Frequency table:

| Word | Count |
|---|---:|
| `"i"` | `2` |
| `"love"` | `2` |
| `"leetcode"` | `1` |
| `"coding"` | `1` |

Sorting key values:

| Word | Key |
|---|---|
| `"i"` | `(-2, "i")` |
| `"love"` | `(-2, "love")` |
| `"leetcode"` | `(-1, "leetcode")` |
| `"coding"` | `(-1, "coding")` |

Sorted order:

```text
["i", "love", "coding", "leetcode"]
```

Take the first `2` words:

```python
["i", "love"]
```

## Why `"coding"` Comes Before `"leetcode"`

Both have frequency `1`.

So the second ordering rule applies:

```text
"coding" < "leetcode"
```

Alphabetically, `"coding"` comes first.

## Correctness

The frequency map correctly stores how many times each word appears in the input.

The sorting rule uses the tuple:

```text
(-frequency, word)
```

Python tuple comparison compares elements from left to right.

Therefore:

1. Words with larger frequencies appear earlier because larger frequencies correspond to smaller negative values.
2. If two words have equal frequencies, the second tuple element compares the words lexicographically in ascending order.

Thus, the sorted list exactly matches the required ordering from the problem statement.

Returning the first `k` words therefore returns the `k` most frequent words in the correct order.

## Complexity

Let:

| Symbol | Meaning |
|---|---|
| `n` | Total number of input words |
| `u` | Number of unique words |

| Metric | Value | Why |
|---|---:|---|
| Time | `O(n + u log u)` | Count frequencies, then sort unique words |
| Space | `O(u)` | Frequency map and sorted unique list |

## Implementation

```python
from collections import Counter

class Solution:
    def topKFrequent(self, words: list[str], k: int) -> list[str]:
        count = Counter(words)

        ordered = sorted(
            count.keys(),
            key=lambda word: (-count[word], word),
        )

        return ordered[:k]
```

## Code Explanation

We count frequencies using `Counter`:

```python
count = Counter(words)
```

For example:

```python
Counter({
    "i": 2,
    "love": 2,
    "leetcode": 1,
    "coding": 1,
})
```

Then we sort the unique words:

```python
ordered = sorted(
    count.keys(),
    key=lambda word: (-count[word], word),
)
```

The key:

```python
(-count[word], word)
```

means:

| Part | Purpose |
|---|---|
| `-count[word]` | Higher frequency first |
| `word` | Alphabetical tie-break |

Finally:

```python
return ordered[:k]
```

returns the top `k` words.

## Heap Alternative

The sorting solution is simple and efficient.

A heap solution is useful when `k` is much smaller than the number of unique words.

Example heap approach:

```python
import heapq
from collections import Counter

class Solution:
    def topKFrequent(self, words: list[str], k: int) -> list[str]:
        count = Counter(words)

        heap = [(-freq, word) for word, freq in count.items()]
        heapq.heapify(heap)

        answer = []

        for _ in range(k):
            _, word = heapq.heappop(heap)
            answer.append(word)

        return answer
```

This uses the same ordering idea:

```python
(-freq, word)
```

because Python heaps are min-heaps.

## Testing

```python
def run_tests():
    s = Solution()

    assert s.topKFrequent(
        ["i", "love", "leetcode", "i", "love", "coding"],
        2,
    ) == ["i", "love"]

    assert s.topKFrequent(
        [
            "the", "day", "is", "sunny",
            "the", "the", "the",
            "sunny", "sunny",
            "is", "is"
        ],
        4,
    ) == ["the", "is", "sunny", "day"]

    assert s.topKFrequent(
        ["a", "b", "c"],
        2,
    ) == ["a", "b"]

    assert s.topKFrequent(
        ["a", "aa", "aaa"],
        3,
    ) == ["a", "aa", "aaa"]

    assert s.topKFrequent(
        ["x", "x", "y", "y", "z"],
        2,
    ) == ["x", "y"]

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Expected | Why |
|---|---:|---|
| `["i","love","leetcode","i","love","coding"]`, `k=2` | `["i","love"]` | Official example |
| Sunny/day example | `["the","is","sunny","day"]` | Official example |
| Equal frequencies | `["a","b"]` | Lexicographical ordering |
| Increasing string lengths | `["a","aa","aaa"]` | Alphabetical ordering check |
| Mixed frequencies | `["x","y"]` | Frequency descending |