# 2.12 Substring Search

Substring search locates occurrences of a pattern `p` inside a text `s`. The objective is to find all indices `i` such that:

```text
s[i : i + m] == p
```

where `m = len(p)`.

The choice of algorithm depends on input size, number of queries, and tolerance for probabilistic error.
## Problem

Given:

```text
s = "abracadabra"
p = "abra"
```

Return all starting indices:

```text
[0, 7]
```
## Baseline: Naive Search

Check every possible starting position.

```go
func NaiveSearch(s, p string) []int {
	n := len(s)
	m := len(p)

	var out []int

	for i := 0; i+m <= n; i++ {
		match := true
		for j := 0; j < m; j++ {
			if s[i+j] != p[j] {
				match = false
				break
			}
		}
		if match {
			out = append(out, i)
		}
	}

	return out
}
```

### Complexity

```text
Worst case: O(n * m)
```

Works well for small inputs or random data. Performs poorly on repetitive inputs like `"aaaaaa..."`.
## Knuth-Morris-Pratt (KMP)

KMP avoids rechecking characters by preprocessing the pattern.

### Idea

Build an array `lps` (longest prefix which is also suffix).
When a mismatch occurs, use `lps` to skip ahead.
### LPS Construction

```go
func BuildLPS(p string) []int {
	m := len(p)
	lps := make([]int, m)

	length := 0
	i := 1

	for i < m {
		if p[i] == p[length] {
			length++
			lps[i] = length
			i++
		} else {
			if length != 0 {
				length = lps[length-1]
			} else {
				lps[i] = 0
				i++
			}
		}
	}

	return lps
}
```
### Search

```go
func KMPSearch(s, p string) []int {
	n := len(s)
	m := len(p)

	lps := BuildLPS(p)

	var out []int

	i, j := 0, 0

	for i < n {
		if s[i] == p[j] {
			i++
			j++
		}

		if j == m {
			out = append(out, i-m)
			j = lps[j-1]
		} else if i < n && s[i] != p[j] {
			if j != 0 {
				j = lps[j-1]
			} else {
				i++
			}
		}
	}

	return out
}
```
### Invariant

```text
At each step, p[0:j] matches s[i-j : i]
```

When mismatch occurs, `lps[j-1]` gives the next valid prefix length.
### Complexity

```text
Time:  O(n + m)
Space: O(m)
```
## Rabin-Karp (Rolling Hash)

Use hashing to compare substrings in constant time.

### Idea

Compute hash of pattern and rolling hash of substrings of `s`.

If hashes match, verify to avoid collisions.
### Implementation

```go
func RabinKarp(s, p string) []int {
	n := len(s)
	m := len(p)

	if m == 0 || m > n {
		return nil
	}

	const base = 256
	const mod = 1_000_000_007

	var h, ph, power int

	power = 1
	for i := 0; i < m-1; i++ {
		power = (power * base) % mod
	}

	for i := 0; i < m; i++ {
		h = (h*base + int(s[i])) % mod
		ph = (ph*base + int(p[i])) % mod
	}

	var out []int

	for i := 0; i <= n-m; i++ {
		if h == ph {
			if s[i:i+m] == p {
				out = append(out, i)
			}
		}

		if i < n-m {
			h = (h - int(s[i])*power%mod + mod) % mod
			h = (h*base + int(s[i+m])) % mod
		}
	}

	return out
}
```
### Complexity

```text
Average: O(n + m)
Worst:   O(n * m) (due to collisions)
```
## Z Algorithm

Build an array `Z` where `Z[i]` is the length of the longest substring starting at `i` that matches the prefix.

Apply it to:

```text
p + "#" + s
```

Matches occur where `Z[i] == len(p)`.
### Key Property

```text
Z[i] = length of match between s[i:] and s[0:]
```
### Complexity

```text
Time: O(n + m)
Space: O(n + m)
```
## When to Use What

| Method      | Time       | Space | Notes                          |
| ----------- | ---------- | ----- | ------------------------------ |
| Naive       | O(n*m)     | O(1)  | Simple, small inputs           |
| KMP         | O(n+m)     | O(m)  | Deterministic, no backtracking |
| Rabin-Karp  | O(n+m) avg | O(1)  | Good for multiple patterns     |
| Z algorithm | O(n+m)     | O(n)  | Clean prefix-based approach    |
## Multiple Pattern Search

For many patterns, use a trie-based automaton such as Aho-Corasick.

This builds a finite automaton that processes the text in one pass and reports all matches.
## Common Pitfalls

Comparing substrings directly in loops leads to quadratic behavior.

Using Rabin-Karp without verifying matches risks false positives.

Incorrect LPS construction breaks KMP silently.

Mixing byte indexing and Unicode characters produces incorrect matches in non-ASCII text.

For large alphabets, hash base selection affects collision rate.
## Design Pattern

Substring search reduces to:

1. A representation of partial matches
2. A rule for advancing on mismatch
3. A way to avoid rechecking previous work

KMP uses prefix structure.
Rabin-Karp uses hashing.
Z uses prefix comparisons.
## Takeaway

Efficient substring search avoids redundant comparisons. KMP provides deterministic linear time. Rolling hash offers flexibility for multiple queries. The correct choice depends on constraints: input size, number of patterns, and tolerance for probabilistic behavior.

