# 2.14 Anagrams

Anagrams are strings or sequences that contain the same elements with the same multiplicities, possibly in different order. For example, `"listen"` and `"silent"` are anagrams because each letter appears the same number of times in both strings.

The usual algorithmic task is to compare frequency tables. Sorting also works, but counting is often faster when the alphabet is small or bounded.

## Problem

Given two strings `a` and `b`, decide whether they are anagrams.

```text id="zppiz6"
a = "listen"
b = "silent"
```

Expected result:

```text id="pik0np"
true
```

For strings of different lengths, the answer is immediately false.

## Solution: Frequency Table

For ASCII lowercase letters, use a fixed-size array of length `26`.

```go id="fh2c1u"
func IsAnagramLowerASCII(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	var count [26]int

	for i := 0; i < len(a); i++ {
		count[a[i]-'a']++
		count[b[i]-'a']--
	}

	for i := 0; i < 26; i++ {
		if count[i] != 0 {
			return false
		}
	}

	return true
}
```

This assumes every byte is between `'a'` and `'z'`.

## Invariant

After processing the first `i` bytes:

```text id="xlq76h"
count[c] = occurrences of c in a[0:i] minus occurrences of c in b[0:i]
```

If both strings are anagrams, every final difference is zero. If any count is nonzero, at least one character appears a different number of times.

## General Byte Strings

For arbitrary byte strings, use an array of length `256`.

```go id="s62t5n"
func IsAnagramBytes(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	var count [256]int

	for i := 0; i < len(a); i++ {
		count[a[i]]++
		count[b[i]]--
	}

	for i := 0; i < 256; i++ {
		if count[i] != 0 {
			return false
		}
	}

	return true
}
```

This treats strings as byte sequences. It is correct for binary data and byte-oriented formats.

## Unicode Code Points

For Unicode code points, range over runes and use a map.

```go id="vtbzto"
func IsAnagramRunes(a, b string) bool {
	count := make(map[rune]int)

	lenA := 0
	for _, r := range a {
		count[r]++
		lenA++
	}

	lenB := 0
	for _, r := range b {
		count[r]--
		lenB++
	}

	if lenA != lenB {
		return false
	}

	for _, v := range count {
		if v != 0 {
			return false
		}
	}

	return true
}
```

This compares Unicode code points, not grapheme clusters or normalized text. For human-facing text, normalize first when needed.

## Sorting Method

Another method is to sort both strings and compare the sorted results.

```go id="mcr4oy"
func IsAnagramBySorting(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	x := []byte(a)
	y := []byte(b)

	sort.Slice(x, func(i, j int) bool { return x[i] < x[j] })
	sort.Slice(y, func(i, j int) bool { return y[i] < y[j] })

	return string(x) == string(y)
}
```

This requires:

```go id="i1gfd5"
import "sort"
```

Sorting is simple and general for byte strings, but slower than counting for bounded alphabets.

Complexity:

```text id="czapcl"
Time:  O(n log n)
Space: O(n)
```

Counting complexity:

```text id="erw4bl"
Time:  O(n)
Space: O(1) for fixed alphabets, O(k) for maps
```

where `k` is the number of distinct symbols.

## Grouping Anagrams

To group words that are anagrams of each other, compute a canonical key for each word.

For lowercase ASCII, the frequency vector is a good key.

```go id="hrwbto"
func GroupAnagrams(words []string) [][]string {
	groups := make(map[[26]int][]string)

	for _, w := range words {
		var key [26]int
		for i := 0; i < len(w); i++ {
			key[w[i]-'a']++
		}
		groups[key] = append(groups[key], w)
	}

	out := make([][]string, 0, len(groups))
	for _, g := range groups {
		out = append(out, g)
	}

	return out
}
```

The array `[26]int` can be used as a map key because arrays are comparable in Go.

## Find Anagrams in a String

A common sliding window problem asks for all positions where a substring of `s` is an anagram of `p`.

```go id="le2aqd"
func FindAnagrams(s, p string) []int {
	if len(p) > len(s) {
		return nil
	}

	var need [26]int
	var have [26]int

	for i := 0; i < len(p); i++ {
		need[p[i]-'a']++
		have[s[i]-'a']++
	}

	var out []int

	if have == need {
		out = append(out, 0)
	}

	for r := len(p); r < len(s); r++ {
		have[s[r]-'a']++
		have[s[r-len(p)]-'a']--

		if have == need {
			out = append(out, r-len(p)+1)
		}
	}

	return out
}
```

Invariant:

```text id="tyo3j9"
Before comparison, have contains frequencies for the current window of length len(p).
```

Each step adds the new rightmost character and removes the old leftmost character.

## Design Choices

| Case                       | Method                  |              Time |       Space |
| -------------------------- | ----------------------- | ----------------: | ----------: |
| Lowercase ASCII            | `[26]int` counts        |            `O(n)` |      `O(1)` |
| Byte strings               | `[256]int` counts       |            `O(n)` |      `O(1)` |
| Unicode code points        | `map[rune]int`          |            `O(n)` |      `O(k)` |
| Simple general byte method | sorting                 |      `O(n log n)` |      `O(n)` |
| Repeated grouping          | canonical frequency key | `O(total length)` | `O(groups)` |

## Common Pitfalls

Using `len(s)` for Unicode character count gives byte count, not rune count.

For lowercase-only code, failing to validate input can panic or corrupt counts when characters fall outside `'a'` to `'z'`.

Sorting strings directly is impossible in Go because strings are immutable. Convert to `[]byte` or `[]rune`.

For human language, visually identical strings may have different Unicode representations. Normalize before comparison when that matters.

Using a map key built by concatenating counts without separators can collide. For example, counts `[1, 11]` and `[11, 1]` both become ambiguous if encoded carelessly.

## Takeaway

Anagram detection is equality of frequency tables. Use fixed arrays for small alphabets, maps for larger symbol sets, and sorting when simplicity matters more than linear time. For substring anagram search, combine a frequency table with a fixed-size sliding window.

