Skip to content

2.15 Frequency Tables

A frequency table records how many times each value appears in an array, string, or stream.

A frequency table records how many times each value appears in an array, string, or stream. It is one of the simplest and most useful summaries of linear data. Many problems about duplicates, anagrams, modes, majority elements, histograms, and sliding windows reduce to maintaining counts.

Problem

Given a sequence, count how often each element occurs.

Example:

[4, 1, 4, 2, 1, 4]

Frequency table:

1 -> 2
2 -> 1
4 -> 3

The table gives a compact representation of multiplicity.

Fixed Alphabet Table

When the value range is small and known, use an array.

For lowercase ASCII letters:

func LetterFrequency(s string) [26]int {
	var count [26]int

	for i := 0; i < len(s); i++ {
		count[s[i]-'a']++
	}

	return count
}

Precondition:

Every byte in s is between 'a' and 'z'.

This version is fast because array indexing is constant time and no hashing is needed.

Byte Frequency Table

For arbitrary byte strings, use 256 counters.

func ByteFrequency(s string) [256]int {
	var count [256]int

	for i := 0; i < len(s); i++ {
		count[s[i]]++
	}

	return count
}

This treats the string as bytes, not Unicode characters.

Integer Frequency Table with Map

When the value range is large or unknown, use a map.

func IntFrequency(a []int) map[int]int {
	count := make(map[int]int)

	for _, x := range a {
		count[x]++
	}

	return count
}

Invariant:

After processing a[0:i], count[x] equals the number of occurrences of x in that prefix.

The invariant is direct: each element increments exactly one counter.

Rune Frequency Table

For Unicode code points:

func RuneFrequency(s string) map[rune]int {
	count := make(map[rune]int)

	for _, r := range s {
		count[r]++
	}

	return count
}

This counts code points. It does not count grapheme clusters, and it does not normalize equivalent Unicode representations.

Finding the Most Frequent Element

A frequency table can be used to compute the mode.

func MostFrequent(a []int) (value int, freq int, ok bool) {
	if len(a) == 0 {
		return 0, 0, false
	}

	count := make(map[int]int)

	for _, x := range a {
		count[x]++

		if count[x] > freq {
			value = x
			freq = count[x]
			ok = true
		}
	}

	return value, freq, ok
}

This updates the best value while building the table, so no second pass is needed.

Checking Equality of Multisets

Two arrays contain the same values with the same multiplicities if their frequency tables match.

func SameMultiset(a, b []int) bool {
	if len(a) != len(b) {
		return false
	}

	count := make(map[int]int)

	for _, x := range a {
		count[x]++
	}

	for _, x := range b {
		count[x]--
		if count[x] < 0 {
			return false
		}
	}

	return true
}

The early negative check proves that b contains some value more often than a.

Because lengths are equal, no final scan is needed. If no counter becomes negative, all counters must return to zero.

Frequency Table in a Sliding Window

Frequency tables are commonly maintained over a moving window.

Example: longest subarray with at most k distinct values.

func LongestAtMostKDistinct(a []int, k int) int {
	count := make(map[int]int)
	l := 0
	best := 0

	for r := 0; r < len(a); r++ {
		count[a[r]]++

		for len(count) > k {
			count[a[l]]--
			if count[a[l]] == 0 {
				delete(count, a[l])
			}
			l++
		}

		if r-l+1 > best {
			best = r - l + 1
		}
	}

	return best
}

Invariant:

count contains exactly the frequencies of values in the current window a[l:r+1].

Deleting zero-count entries is important because len(count) is used as the number of distinct values.

Prefix Frequency Tables

For repeated range frequency queries over a small alphabet, build prefix counts.

For lowercase ASCII:

func PrefixLetterFrequency(s string) [][26]int {
	prefix := make([][26]int, len(s)+1)

	for i := 0; i < len(s); i++ {
		prefix[i+1] = prefix[i]
		prefix[i+1][s[i]-'a']++
	}

	return prefix
}

Frequency of letters in the half-open range [l, r):

func RangeLetterFrequency(prefix [][26]int, l, r int) [26]int {
	var out [26]int

	for c := 0; c < 26; c++ {
		out[c] = prefix[r][c] - prefix[l][c]
	}

	return out
}

This gives O(26) query time after O(26n) logical preprocessing. Since 26 is constant, this is effectively linear preprocessing and constant query time for lowercase English letters.

Sparse vs Dense Tables

Use dense arrays when the domain is compact:

byte values:       [256]int
lowercase letters: [26]int
small IDs:         []int

Use maps when the domain is sparse or unknown:

arbitrary integers
strings
runes
large IDs
composite keys

Dense tables are faster and more memory predictable. Maps are more flexible.

Coordinate Compression

If values are large but the number of distinct values is small, compress them.

func CompressValues(a []int) (ids []int, values []int) {
	values = append(values, a...)
	sort.Ints(values)

	values = uniqueInts(values)

	index := make(map[int]int, len(values))
	for i, v := range values {
		index[v] = i
	}

	ids = make([]int, len(a))
	for i, v := range a {
		ids[i] = index[v]
	}

	return ids, values
}

func uniqueInts(a []int) []int {
	if len(a) == 0 {
		return a
	}

	write := 1
	for read := 1; read < len(a); read++ {
		if a[read] != a[write-1] {
			a[write] = a[read]
			write++
		}
	}

	return a[:write]
}

This allows later algorithms to use arrays indexed by compressed IDs.

Complexity

For n elements and k distinct values:

Dense array table: O(n) time, O(domain size) space
Map table:         O(n) expected time, O(k) space
Sorted method:     O(n log n) time, O(1) or O(n) extra space

The map bound is expected time because hash table operations are expected constant time.

Common Pitfalls

Leaving zero-count entries in a map can break distinct-count logic.

Using a fixed-size table without validating input can panic when an index is outside the expected range.

Using len(s) as a character count gives bytes, not runes.

Frequency tables count multiplicity, not order. If order matters, store positions or use a different structure.

Integer counters can overflow on very large streams. Use a wider type when needed.

Map iteration order is unspecified. Do not use it when deterministic output order is required.

Takeaway

A frequency table is the standard representation for multiplicity. Use arrays for small fixed domains, maps for sparse domains, and prefix frequency tables for repeated range queries. The key invariant is simple: after processing a prefix, each counter equals the number of occurrences of its value in that prefix.