# LeetCode 192: Word Frequency

## Problem Restatement

We are given a text file named `words.txt`.

We need to write a Bash script that counts how many times each word appears in the file.

The output should show each word followed by its frequency, sorted from highest frequency to lowest frequency.

The problem assumes that:

| Rule | Meaning |
|---|---|
| Characters | The file contains only lowercase letters and spaces |
| Words | Each word contains lowercase letters only |
| Separators | Words are separated by one or more whitespace characters |
| Ties | Frequency counts are unique, so tie handling is not needed |

Example input:

```text
the day is sunny the the
the sunny is is
```

Expected output:

```text
the 4
is 3
sunny 2
day 1
```

## Input and Output

| Item | Meaning |
|---|---|
| Input | A file named `words.txt` |
| Output | Each word and its frequency |
| Order | Descending by frequency |
| Language | Bash shell script |

## Key Insight

The important detail is that `uniq -c` only counts adjacent duplicate lines.

So before counting, we must make equal words adjacent.

That gives us the pipeline:

1. Put each word on its own line.
2. Sort the words alphabetically.
3. Count adjacent equal words.
4. Sort the counts in descending order.
5. Print the word first, then the count.

## Algorithm

Start with the file:

```bash
cat words.txt
```

Convert spaces into newlines:

```bash
tr -s ' ' '\n'
```

The `-s` option squeezes repeated spaces, so multiple spaces behave like one separator.

Then sort the words:

```bash
sort
```

Now equal words are next to each other.

Count repeated adjacent words:

```bash
uniq -c
```

This produces output like:

```text
1 day
3 is
2 sunny
4 the
```

Sort by count in descending numeric order:

```bash
sort -nr
```

Finally, swap the columns so the word appears before the count:

```bash
awk '{ print $2, $1 }'
```

## Correctness

After `tr -s ' ' '\n'`, every word from the file appears on its own line.

After `sort`, all equal words are grouped together. Therefore, each group contains exactly all occurrences of one word.

The command `uniq -c` counts the size of each group, so it computes the correct frequency for every distinct word.

Then `sort -nr` orders the counted rows by descending numeric frequency.

Finally, `awk '{ print $2, $1 }'` changes the display format from `count word` to `word count`.

Therefore, the pipeline prints every word with its correct frequency in descending frequency order.

## Complexity

Let `n` be the number of words in the file.

| Metric | Value | Why |
|---|---|---|
| Time | `O(n log n)` | Sorting dominates the pipeline |
| Space | `O(n)` | Sorting may need storage proportional to the input size |

## Implementation

```bash
# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -nr | awk '{ print $2, $1 }'
```

## Code Explanation

The command starts by reading the file:

```bash
cat words.txt
```

Then it normalizes the input so each word appears on a separate line:

```bash
tr -s ' ' '\n'
```

Next, it sorts the words:

```bash
sort
```

This step is required because `uniq -c` only counts duplicate lines that are next to each other.

Then it counts each group:

```bash
uniq -c
```

The result has the count first and the word second.

Then it sorts by frequency:

```bash
sort -nr
```

The `-n` option means numeric sort.

The `-r` option means reverse order, so larger counts come first.

Finally:

```bash
awk '{ print $2, $1 }'
```

prints the second field first, then the first field.

So this:

```text
4 the
```

becomes:

```text
the 4
```

## Testing

Create a sample file:

```bash
cat > words.txt << 'EOF'
the day is sunny the the
the sunny is is
EOF
```

Run the solution:

```bash
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -nr | awk '{ print $2, $1 }'
```

Expected output:

```text
the 4
is 3
sunny 2
day 1
```

Another test with repeated spaces:

```bash
cat > words.txt << 'EOF'
a   b a
c b   a
EOF
```

Expected output:

```text
a 3
b 2
c 1
```

