Rolling hash search computes a hash over a fixed size window and updates it efficiently as the window moves. It allows substring search in linear time with low constant factors.
You use it when scanning large text for patterns or when building string matching algorithms such as Rabin Karp.
Problem
Given a text of length and a pattern of length , find all indices such that:
Model
Define a polynomial hash for a string:
For a sliding window, the hash updates as:
This avoids recomputing the hash from scratch.
Algorithm
rolling_hash_search(T, P):
n = length(T)
m = length(P)
target = hash(P)
current = hash(T[0..m-1])
for i from 0 to n - m:
if current == target:
if T[i..i+m-1] == P:
report match at i
if i < n - m:
current = update_hash(current, T[i], T[i+m])
return all matchesExample
Let:
Sliding windows:
| index | substring | hash match |
|---|---|---|
| 0 | abra | yes |
| 1 | brac | no |
| 2 | raca | no |
| 3 | acad | no |
| 4 | cada | no |
| 5 | adab | no |
| 6 | dabr | no |
| 7 | abra | yes |
Matches at indices 0 and 7.
Correctness
The rolling hash ensures that every substring of length is assigned a hash value. When the hash matches the pattern hash, a direct comparison verifies the match.
The direct comparison eliminates false positives caused by hash collisions.
Complexity
| operation | time |
|---|---|
| preprocessing | |
| search | |
| verification | depends on matches |
Total time is in expectation.
Space
Only a few integer values are stored.
When to Use
Rolling hash search is appropriate when:
- substring matching is required
- repeated hash computation must be avoided
- large texts are processed
- approximate matching is combined with verification
- multiple pattern checks are needed
It is widely used in string matching, plagiarism detection, deduplication, and substring indexing.
Implementation
def rolling_hash_search(text, pattern):
n, m = len(text), len(pattern)
if m > n:
return []
B = 256
M = 10**9 + 7
def hash_str(s):
h = 0
for c in s:
h = (h * B + ord(c)) % M
return h
target = hash_str(pattern)
current = hash_str(text[:m])
power = pow(B, m - 1, M)
result = []
for i in range(n - m + 1):
if current == target:
if text[i:i+m] == pattern:
result.append(i)
if i < n - m:
current = (current - ord(text[i]) * power) % M
current = (current * B + ord(text[i + m])) % M
return resultfunc RollingHashSearch(text, pattern string) []int {
n, m := len(text), len(pattern)
if m > n {
return nil
}
const B = 256
const M = 1000000007
hashStr := func(s string) int {
h := 0
for i := 0; i < len(s); i++ {
h = (h*B + int(s[i])) % M
}
return h
}
target := hashStr(pattern)
current := hashStr(text[:m])
power := 1
for i := 0; i < m-1; i++ {
power = (power * B) % M
}
var result []int
for i := 0; i <= n-m; i++ {
if current == target && text[i:i+m] == pattern {
result = append(result, i)
}
if i < n-m {
current = (current - int(text[i])*power%M + M) % M
current = (current*B + int(text[i+m])) % M
}
}
return result
}