2.14 Anagrams

Anagrams are strings or sequences that contain the same elements with the same multiplicities, possibly in different order. For example, "listen" and "silent" are anagrams because each letter appears the same number of times in both strings.

The usual algorithmic task is to compare frequency tables. Sorting also works, but counting is often faster when the alphabet is small or bounded.

Problem

Given two strings a and b, decide whether they are anagrams.

a = "listen"
b = "silent"

Expected result:

true

For strings of different lengths, the answer is immediately false.

Solution: Frequency Table

For ASCII lowercase letters, use a fixed-size array of length 26.

func IsAnagramLowerASCII(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	var count [26]int

	for i := 0; i < len(a); i++ {
		count[a[i]-'a']++
		count[b[i]-'a']--
	}

	for i := 0; i < 26; i++ {
		if count[i] != 0 {
			return false
		}
	}

	return true
}

This assumes every byte is between 'a' and 'z'.

Invariant

After processing the first i bytes:

count[c] = occurrences of c in a[0:i] minus occurrences of c in b[0:i]

If both strings are anagrams, every final difference is zero. If any count is nonzero, at least one character appears a different number of times.

General Byte Strings

For arbitrary byte strings, use an array of length 256.

func IsAnagramBytes(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	var count [256]int

	for i := 0; i < len(a); i++ {
		count[a[i]]++
		count[b[i]]--
	}

	for i := 0; i < 256; i++ {
		if count[i] != 0 {
			return false
		}
	}

	return true
}

This treats strings as byte sequences. It is correct for binary data and byte-oriented formats.

Unicode Code Points

For Unicode code points, range over runes and use a map.

func IsAnagramRunes(a, b string) bool {
	count := make(map[rune]int)

	lenA := 0
	for _, r := range a {
		count[r]++
		lenA++
	}

	lenB := 0
	for _, r := range b {
		count[r]--
		lenB++
	}

	if lenA != lenB {
		return false
	}

	for _, v := range count {
		if v != 0 {
			return false
		}
	}

	return true
}

This compares Unicode code points, not grapheme clusters or normalized text. For human-facing text, normalize first when needed.

Sorting Method

Another method is to sort both strings and compare the sorted results.

func IsAnagramBySorting(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	x := []byte(a)
	y := []byte(b)

	sort.Slice(x, func(i, j int) bool { return x[i] < x[j] })
	sort.Slice(y, func(i, j int) bool { return y[i] < y[j] })

	return string(x) == string(y)
}

This requires:

import "sort"

Sorting is simple and general for byte strings, but slower than counting for bounded alphabets.

Complexity:

Time:  O(n log n)
Space: O(n)

Counting complexity:

Time:  O(n)
Space: O(1) for fixed alphabets, O(k) for maps

where k is the number of distinct symbols.

Grouping Anagrams

To group words that are anagrams of each other, compute a canonical key for each word.

For lowercase ASCII, the frequency vector is a good key.

func GroupAnagrams(words []string) [][]string {
	groups := make(map[[26]int][]string)

	for _, w := range words {
		var key [26]int
		for i := 0; i < len(w); i++ {
			key[w[i]-'a']++
		}
		groups[key] = append(groups[key], w)
	}

	out := make([][]string, 0, len(groups))
	for _, g := range groups {
		out = append(out, g)
	}

	return out
}

The array [26]int can be used as a map key because arrays are comparable in Go.

Find Anagrams in a String

A common sliding window problem asks for all positions where a substring of s is an anagram of p.

func FindAnagrams(s, p string) []int {
	if len(p) > len(s) {
		return nil
	}

	var need [26]int
	var have [26]int

	for i := 0; i < len(p); i++ {
		need[p[i]-'a']++
		have[s[i]-'a']++
	}

	var out []int

	if have == need {
		out = append(out, 0)
	}

	for r := len(p); r < len(s); r++ {
		have[s[r]-'a']++
		have[s[r-len(p)]-'a']--

		if have == need {
			out = append(out, r-len(p)+1)
		}
	}

	return out
}

Invariant:

Before comparison, have contains frequencies for the current window of length len(p).

Each step adds the new rightmost character and removes the old leftmost character.

Design Choices

Case	Method	Time	Space
Lowercase ASCII	`[26]int` counts	`O(n)`	`O(1)`
Byte strings	`[256]int` counts	`O(n)`	`O(1)`
Unicode code points	`map[rune]int`	`O(n)`	`O(k)`
Simple general byte method	sorting	`O(n log n)`	`O(n)`
Repeated grouping	canonical frequency key	`O(total length)`	`O(groups)`

Common Pitfalls

Using len(s) for Unicode character count gives byte count, not rune count.

For lowercase-only code, failing to validate input can panic or corrupt counts when characters fall outside 'a' to 'z'.

Sorting strings directly is impossible in Go because strings are immutable. Convert to []byte or []rune.

For human language, visually identical strings may have different Unicode representations. Normalize before comparison when that matters.

Using a map key built by concatenating counts without separators can collide. For example, counts [1, 11] and [11, 1] both become ambiguous if encoded carelessly.

Takeaway

Anagram detection is equality of frequency tables. Use fixed arrays for small alphabets, maps for larger symbol sets, and sorting when simplicity matters more than linear time. For substring anagram search, combine a frequency table with a fixed-size sliding window.