Search for a pattern in text by comparing rolling hash values before verifying matching windows.
Rabin Karp search is a string matching algorithm that uses a rolling hash to find candidate matches of a pattern in a text. It compares hash values first, then verifies equal-hash windows by direct character comparison.
This separates fast candidate detection from exact verification. It is especially useful when searching for many patterns or when rolling hashes are reused in a larger text processing pipeline.
Problem
Given a text of length and a pattern of length , find all indices such that:
Model
Use a polynomial rolling hash:
where:
- is the base
- is the modulus
- characters are treated as integer codes
The next window hash is updated from the previous one in constant time:
Algorithm
rabin_karp_search(T, P):
n = length(T)
m = length(P)
if m > n:
return empty list
target = hash(P)
current = hash(T[0..m-1])
for i from 0 to n - m:
if current == target:
if T[i..i+m-1] == P:
report i
if i < n - m:
current = roll(current, T[i], T[i+m])
return reported indicesExample
Let:
and:
The algorithm checks all length 3 windows:
| index | window | hash comparison | exact match |
|---|---|---|---|
| 0 | ban | no | no |
| 1 | ana | yes | yes |
| 2 | nan | no | no |
| 3 | ana | yes | yes |
| 4 | nab | no | no |
| 5 | aba | no | no |
| 6 | ban | no | no |
The matches are:
Correctness
Rabin Karp examines every substring of length in the text. If a substring equals the pattern, then its hash equals the pattern hash, so it becomes a candidate and is verified by direct comparison.
If the algorithm reports an index, it has checked that the corresponding substring equals the pattern. Therefore every reported index is correct.
Hash collisions may create extra candidates, but verification prevents false matches.
Complexity
| case | time |
|---|---|
| expected | |
| worst |
The worst case occurs when many windows have the same hash as the pattern and require full verification.
Space
The algorithm stores only hash values, constants, and the output list.
When to Use
Rabin Karp search is appropriate when:
- rolling hash infrastructure already exists
- many patterns are searched against the same text
- substring candidates must be generated quickly
- occasional hash collisions are acceptable because verification is used
For one exact pattern, Knuth Morris Pratt or Boyer Moore variants usually give stronger deterministic guarantees.
Implementation
def rabin_karp_search(text, pattern):
n, m = len(text), len(pattern)
if m > n:
return []
base = 256
mod = 1_000_000_007
def build_hash(s):
h = 0
for c in s:
h = (h * base + ord(c)) % mod
return h
target = build_hash(pattern)
current = build_hash(text[:m])
power = pow(base, m - 1, mod)
result = []
for i in range(n - m + 1):
if current == target and text[i:i + m] == pattern:
result.append(i)
if i < n - m:
current = (current - ord(text[i]) * power) % mod
current = (current * base + ord(text[i + m])) % mod
return resultfunc RabinKarpSearch(text, pattern string) []int {
n, m := len(text), len(pattern)
if m > n {
return nil
}
const base = 256
const mod = 1000000007
hashString := func(s string) int {
h := 0
for i := 0; i < len(s); i++ {
h = (h*base + int(s[i])) % mod
}
return h
}
target := hashString(pattern)
current := hashString(text[:m])
power := 1
for i := 0; i < m-1; i++ {
power = (power * base) % mod
}
var result []int
for i := 0; i <= n-m; i++ {
if current == target && text[i:i+m] == pattern {
result = append(result, i)
}
if i < n-m {
current = (current - int(text[i])*power%mod + mod) % mod
current = (current*base + int(text[i+m])) % mod
}
}
return result
}