Substring search locates occurrences of a pattern p inside a text s. The objective is to find all indices i such that:
s[i : i + m] == pwhere m = len(p).
The choice of algorithm depends on input size, number of queries, and tolerance for probabilistic error.
Problem
Given:
s = "abracadabra"
p = "abra"Return all starting indices:
[0, 7]Baseline: Naive Search
Check every possible starting position.
func NaiveSearch(s, p string) []int {
n := len(s)
m := len(p)
var out []int
for i := 0; i+m <= n; i++ {
match := true
for j := 0; j < m; j++ {
if s[i+j] != p[j] {
match = false
break
}
}
if match {
out = append(out, i)
}
}
return out
}Complexity
Worst case: O(n * m)Works well for small inputs or random data. Performs poorly on repetitive inputs like "aaaaaa...".
Knuth-Morris-Pratt (KMP)
KMP avoids rechecking characters by preprocessing the pattern.
Idea
Build an array lps (longest prefix which is also suffix).
When a mismatch occurs, use lps to skip ahead.
LPS Construction
func BuildLPS(p string) []int {
m := len(p)
lps := make([]int, m)
length := 0
i := 1
for i < m {
if p[i] == p[length] {
length++
lps[i] = length
i++
} else {
if length != 0 {
length = lps[length-1]
} else {
lps[i] = 0
i++
}
}
}
return lps
}Search
func KMPSearch(s, p string) []int {
n := len(s)
m := len(p)
lps := BuildLPS(p)
var out []int
i, j := 0, 0
for i < n {
if s[i] == p[j] {
i++
j++
}
if j == m {
out = append(out, i-m)
j = lps[j-1]
} else if i < n && s[i] != p[j] {
if j != 0 {
j = lps[j-1]
} else {
i++
}
}
}
return out
}Invariant
At each step, p[0:j] matches s[i-j : i]When mismatch occurs, lps[j-1] gives the next valid prefix length.
Complexity
Time: O(n + m)
Space: O(m)Rabin-Karp (Rolling Hash)
Use hashing to compare substrings in constant time.
Idea
Compute hash of pattern and rolling hash of substrings of s.
If hashes match, verify to avoid collisions.
Implementation
func RabinKarp(s, p string) []int {
n := len(s)
m := len(p)
if m == 0 || m > n {
return nil
}
const base = 256
const mod = 1_000_000_007
var h, ph, power int
power = 1
for i := 0; i < m-1; i++ {
power = (power * base) % mod
}
for i := 0; i < m; i++ {
h = (h*base + int(s[i])) % mod
ph = (ph*base + int(p[i])) % mod
}
var out []int
for i := 0; i <= n-m; i++ {
if h == ph {
if s[i:i+m] == p {
out = append(out, i)
}
}
if i < n-m {
h = (h - int(s[i])*power%mod + mod) % mod
h = (h*base + int(s[i+m])) % mod
}
}
return out
}Complexity
Average: O(n + m)
Worst: O(n * m) (due to collisions)Z Algorithm
Build an array Z where Z[i] is the length of the longest substring starting at i that matches the prefix.
Apply it to:
p + "#" + sMatches occur where Z[i] == len(p).
Key Property
Z[i] = length of match between s[i:] and s[0:]Complexity
Time: O(n + m)
Space: O(n + m)When to Use What
| Method | Time | Space | Notes |
|---|---|---|---|
| Naive | O(n*m) | O(1) | Simple, small inputs |
| KMP | O(n+m) | O(m) | Deterministic, no backtracking |
| Rabin-Karp | O(n+m) avg | O(1) | Good for multiple patterns |
| Z algorithm | O(n+m) | O(n) | Clean prefix-based approach |
Multiple Pattern Search
For many patterns, use a trie-based automaton such as Aho-Corasick.
This builds a finite automaton that processes the text in one pass and reports all matches.
Common Pitfalls
Comparing substrings directly in loops leads to quadratic behavior.
Using Rabin-Karp without verifying matches risks false positives.
Incorrect LPS construction breaks KMP silently.
Mixing byte indexing and Unicode characters produces incorrect matches in non-ASCII text.
For large alphabets, hash base selection affects collision rate.
Design Pattern
Substring search reduces to:
- A representation of partial matches
- A rule for advancing on mismatch
- A way to avoid rechecking previous work
KMP uses prefix structure. Rabin-Karp uses hashing. Z uses prefix comparisons.
Takeaway
Efficient substring search avoids redundant comparisons. KMP provides deterministic linear time. Rolling hash offers flexibility for multiple queries. The correct choice depends on constraints: input size, number of patterns, and tolerance for probabilistic behavior.