# 2.18 String Comparison

String comparison determines the ordering or equality of two strings. The simplest form compares byte sequences. More advanced forms compare Unicode code points, normalized text, or locale-specific collation units. Algorithmically, the important point is that comparison is usually linear in the length of the common prefix.

## Problem

Given two strings `a` and `b`, decide whether:

```text id="hlmkql"
a == b
a < b
a > b
```

For example:

```text id="h23frz"
"apple" < "apply"
"car" > "cap"
"same" == "same"
```

The comparison stops at the first position where the strings differ. If no differing position exists, the shorter string is smaller unless both lengths are equal.

## Bytewise Equality

Equality checks whether two strings have the same length and the same byte at every index.

```go id="trwzfa"
func EqualBytes(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	for i := 0; i < len(a); i++ {
		if a[i] != b[i] {
			return false
		}
	}

	return true
}
```

Invariant:

```text id="wfczaq"
If the loop reaches index i, then a[0:i] and b[0:i] are equal.
```

Complexity:

```text id="f3ewl7"
Time:  O(min(n, m)) for early mismatch, O(n) when lengths and contents match
Space: O(1)
```

In Go, `a == b` already performs this operation efficiently, so write the explicit loop only when teaching, instrumenting, or customizing comparison.

## Lexicographic Comparison

Lexicographic comparison is dictionary order over a chosen alphabet or encoding.

```go id="mx9ily"
func CompareBytes(a, b string) int {
	n := len(a)
	if len(b) < n {
		n = len(b)
	}

	for i := 0; i < n; i++ {
		if a[i] < b[i] {
			return -1
		}
		if a[i] > b[i] {
			return 1
		}
	}

	if len(a) < len(b) {
		return -1
	}
	if len(a) > len(b) {
		return 1
	}
	return 0
}
```

Return convention:

```text id="l7dpmm"
-1 means a < b
 0 means a == b
 1 means a > b
```

Invariant:

```text id="o789hi"
Before iteration i, a[0:i] and b[0:i] are equal.
```

The first differing byte determines the order.

## Prefix Comparison

Prefix checks occur in parsers, routing, search indexes, and string matching.

```go id="h9y96g"
func HasPrefixBytes(s, prefix string) bool {
	if len(prefix) > len(s) {
		return false
	}

	for i := 0; i < len(prefix); i++ {
		if s[i] != prefix[i] {
			return false
		}
	}

	return true
}
```

Invariant:

```text id="1gmq6v"
If the loop reaches index i, then s[0:i] equals prefix[0:i].
```

Suffix comparison is similar, but uses an offset:

```go id="ycb395"
func HasSuffixBytes(s, suffix string) bool {
	if len(suffix) > len(s) {
		return false
	}

	offset := len(s) - len(suffix)

	for i := 0; i < len(suffix); i++ {
		if s[offset+i] != suffix[i] {
			return false
		}
	}

	return true
}
```

## Case-Insensitive ASCII Comparison

For ASCII-only input, normalize each byte before comparison.

```go id="hda942"
func EqualFoldASCII(a, b string) bool {
	if len(a) != len(b) {
		return false
	}

	for i := 0; i < len(a); i++ {
		if lowerASCII(a[i]) != lowerASCII(b[i]) {
			return false
		}
	}

	return true
}

func lowerASCII(c byte) byte {
	if 'A' <= c && c <= 'Z' {
		return c + ('a' - 'A')
	}
	return c
}
```

This does not implement Unicode case folding. It only handles ASCII letters.

## Unicode Code Point Comparison

To compare by Unicode code points, decode runes.

```go id="hfk1gj"
func EqualRunes(a, b string) bool {
	ra := []rune(a)
	rb := []rune(b)

	if len(ra) != len(rb) {
		return false
	}

	for i := 0; i < len(ra); i++ {
		if ra[i] != rb[i] {
			return false
		}
	}

	return true
}
```

This compares code points, not user-visible grapheme clusters. Some visually identical strings may still compare unequal because Unicode allows multiple representations.

## Normalization Concern

Consider two ways to represent an accented character:

```text id="j0j6kr"
"é"        one code point
"e" + "́"  two code points
```

They may look the same but have different byte sequences and different rune sequences. Human-facing string comparison often requires Unicode normalization before comparison.

For algorithm problems and byte-oriented systems, bytewise comparison is usually the intended model. For user-facing text, specify normalization explicitly.

## Comparing Many Strings

When sorting strings, each comparison may scan a long common prefix. If many strings share prefixes, total comparison cost can dominate.

Example:

```go id="c8d6y9"
sort.Strings(words)
```

For strings of maximum length `L`, comparison sort costs:

```text id="8zt9ml"
O(n log n * L)
```

in the worst case.

For large sets of strings with many common prefixes, tries, suffix arrays, or radix sort may be more suitable.

## Common Prefix Length

Some algorithms need the length of the longest common prefix.

```go id="ww3f4g"
func LCP(a, b string) int {
	n := len(a)
	if len(b) < n {
		n = len(b)
	}

	i := 0
	for i < n && a[i] == b[i] {
		i++
	}

	return i
}
```

This function returns a byte length. For rune length, convert or decode runes.

## Constant-Time Comparison

Security-sensitive code sometimes needs comparison time independent of the first mismatch position. Ordinary string comparison exits early and can leak information through timing.

For secrets such as authentication tags, use a constant-time comparison from a cryptographic library rather than writing a custom loop.

Conceptually, compare all bytes and accumulate differences:

```go id="dndzml"
func EqualConstantTimeBytes(a, b []byte) bool {
	if len(a) != len(b) {
		return false
	}

	var diff byte
	for i := 0; i < len(a); i++ {
		diff |= a[i] ^ b[i]
	}

	return diff == 0
}
```

For production cryptographic code in Go, prefer `crypto/subtle`.

## Complexity

For two strings of lengths `n` and `m`:

```text id="g0ef11"
Equality:      O(min(n, m)) with early length or mismatch checks
Lexicographic: O(min(n, m))
Prefix check:  O(len(prefix))
Suffix check:  O(len(suffix))
Space:         O(1) for bytewise comparison
```

Converting to `[]rune` costs:

```text id="iwk9m0"
Time:  O(n + m)
Space: O(n + m)
```

## Common Pitfalls

Using byte comparison when the problem requires Unicode-aware comparison.

Using Unicode code point comparison when the problem requires normalized human text comparison.

Assuming string comparison is constant time. It is usually linear in the common prefix length.

Sorting many long strings without accounting for comparison cost.

Using ordinary comparison for secrets.

Mixing byte offsets with rune positions in diagnostics.

## Takeaway

String comparison is prefix scanning with an ordering rule. For algorithms, bytewise comparison is usually precise, fast, and easy to reason about. For user-facing text, define the comparison model explicitly: bytes, runes, normalized Unicode, or locale-aware collation.

