# LeetCode 819: Most Common Word

## Problem Restatement

We are given a string `paragraph` and a list of banned words.

We need to return the most frequent word in `paragraph` that is not banned.

Words are case-insensitive, so:

```python
"Bob"
"bob"
"BOB"
```

all count as the same word.

The answer must be returned in lowercase.

Punctuation should be ignored. The paragraph may contain spaces and symbols such as:

```python
!?',;.
```

The problem guarantees that at least one non-banned word exists, and that the answer is unique. ([leetcode.com](https://leetcode.com/problems/most-common-word/), [doocs](https://github.com/doocs/leetcode/blob/main/solution/0800-0899/0819.Most%20Common%20Word/README_EN.md))

## Input and Output

| Item | Meaning |
|---|---|
| Input | A paragraph string and an array `banned` |
| Output | The most frequent word that is not banned |
| Case rule | Words are compared case-insensitively |
| Punctuation rule | Punctuation separates words |
| Guarantee | The answer is unique |

## Examples

Example 1:

```python
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
```

After converting to lowercase and removing punctuation, the words are:

```python
["bob", "hit", "a", "ball", "the", "hit", "ball", "flew", "far", "after", "it", "was", "hit"]
```

The word `"hit"` appears three times, but it is banned.

The word `"ball"` appears two times and is not banned.

So the answer is:

```python
"ball"
```

Example 2:

```python
paragraph = "a."
banned = []
```

The only word is:

```python
"a"
```

So the answer is:

```python
"a"
```

## First Thought: Split by Spaces

A first idea is to split the paragraph by spaces:

```python
paragraph.split()
```

But this does not handle punctuation correctly.

For example:

```python
"ball,"
```

would stay as `"ball,"`, not `"ball"`.

So we need to normalize the paragraph first.

## Key Insight

Punctuation acts like a word separator.

We can scan the paragraph character by character.

If the character is a letter, keep its lowercase form.

Otherwise, replace it with a space.

Then the paragraph becomes easy to split into clean lowercase words.

For example:

```python
"Bob hit a ball,"
```

becomes:

```python
"bob hit a ball "
```

Then we count every word that is not banned.

## Algorithm

Convert `banned` to a set:

```python
banned_set = set(banned)
```

Normalize the paragraph:

1. Convert letters to lowercase.
2. Replace non-letters with spaces.

Then split into words:

```python
words = normalized.split()
```

Use a hash map to count non-banned words.

For each word:

1. If the word is banned, skip it.
2. Otherwise, increment its count.
3. Track the word with the largest count seen so far.

Return the best word.

## Correctness

The normalization step preserves every letter from the paragraph and converts it to lowercase. It also replaces every punctuation symbol or space with a space. Therefore, splitting the normalized string produces exactly the lowercase words from the paragraph, with punctuation ignored.

The banned set contains exactly the words that must not be considered. The algorithm skips every word in this set, so no banned word can become the answer.

For every non-banned word, the algorithm increments its frequency count once for each occurrence. Therefore, the hash map stores the correct frequency of every allowed word.

Whenever a word’s count becomes larger than the current maximum, the algorithm updates the answer. Since the problem guarantees a unique most frequent non-banned word, the final stored answer is exactly the required word.

## Complexity

Let `n = len(paragraph)` and `m = len(banned)`.

| Metric | Value | Why |
|---|---|---|
| Time | `O(n + m)` | Build the banned set, scan the paragraph, and count words |
| Space | `O(n + m)` | Store normalized characters, banned words, and word counts |

## Implementation

```python
from collections import defaultdict

class Solution:
    def mostCommonWord(self, paragraph: str, banned: list[str]) -> str:
        banned_set = set(banned)

        chars = []

        for ch in paragraph:
            if ch.isalpha():
                chars.append(ch.lower())
            else:
                chars.append(" ")

        normalized = "".join(chars)

        counts = defaultdict(int)
        best_word = ""
        best_count = 0

        for word in normalized.split():
            if word in banned_set:
                continue

            counts[word] += 1

            if counts[word] > best_count:
                best_count = counts[word]
                best_word = word

        return best_word
```

## Code Explanation

We first make banned-word lookup fast:

```python
banned_set = set(banned)
```

Then we build a normalized version of the paragraph:

```python
chars = []
```

Letters are kept and converted to lowercase:

```python
if ch.isalpha():
    chars.append(ch.lower())
```

Everything else becomes a space:

```python
else:
    chars.append(" ")
```

This handles commas, periods, exclamation marks, and other allowed punctuation.

After joining the characters:

```python
normalized = "".join(chars)
```

we can safely split by whitespace:

```python
for word in normalized.split():
```

Banned words are ignored:

```python
if word in banned_set:
    continue
```

Every allowed word is counted:

```python
counts[word] += 1
```

When a word reaches a new largest count, we update the answer:

```python
if counts[word] > best_count:
    best_count = counts[word]
    best_word = word
```

Finally, return the most frequent allowed word.

## Testing

```python
def run_tests():
    s = Solution()

    assert s.mostCommonWord(
        "Bob hit a ball, the hit BALL flew far after it was hit.",
        ["hit"],
    ) == "ball"

    assert s.mostCommonWord(
        "a.",
        [],
    ) == "a"

    assert s.mostCommonWord(
        "Bob. hIt, baLl",
        ["bob", "hit"],
    ) == "ball"

    assert s.mostCommonWord(
        "a, a, a, b,b,b,c, c",
        ["a"],
    ) == "b"

    assert s.mostCommonWord(
        "Hello! hello? world.",
        ["world"],
    ) == "hello"

    print("all tests passed")

run_tests()
```

| Test | Why |
|---|---|
| Official paragraph example | Checks punctuation, casing, and banned word |
| `"a."` | Minimum simple case |
| Mixed casing | Confirms case-insensitive counting |
| Repeated punctuation-separated words | Checks punctuation normalization |
| Banned lower-frequency alternative | Ensures banned words are skipped |

