Rabin Karp Search

Rabin Karp search is a string matching algorithm that uses a rolling hash to find candidate matches of a pattern in a text. It compares hash values first, then verifies equal-hash windows by direct character comparison.

This separates fast candidate detection from exact verification. It is especially useful when searching for many patterns or when rolling hashes are reused in a larger text processing pipeline.

Problem

Given a text $T$ of length $n$ and a pattern $P$ of length $m$ , find all indices $i$ such that:

T[i \dots i+m-1] = P

Model

Use a polynomial rolling hash:

H(s_0s_1\dots s_{m-1}) = \left( s_0B^{m-1} + s_1B^{m-2} + \dots + s_{m-1} \right) \bmod M

where:

$B$ is the base
$M$ is the modulus
characters are treated as integer codes

The next window hash is updated from the previous one in constant time:

H_{i+1} = \left( (H_i - T[i]B^{m-1})B + T[i+m] \right) \bmod M

Algorithm

rabin_karp_search(T, P):
    n = length(T)
    m = length(P)

    if m > n:
        return empty list

    target = hash(P)
    current = hash(T[0..m-1])

    for i from 0 to n - m:
        if current == target:
            if T[i..i+m-1] == P:
                report i

        if i < n - m:
            current = roll(current, T[i], T[i+m])

    return reported indices

Example

Let:

T = "bananaban"

and:

P = "ana"

The algorithm checks all length 3 windows:

index	window	hash comparison	exact match
0	ban	no	no
1	ana	yes	yes
2	nan	no	no
3	ana	yes	yes
4	nab	no	no
5	aba	no	no
6	ban	no	no

The matches are:

[1, 3]

Correctness

Rabin Karp examines every substring of length $m$ in the text. If a substring equals the pattern, then its hash equals the pattern hash, so it becomes a candidate and is verified by direct comparison.

If the algorithm reports an index, it has checked that the corresponding substring equals the pattern. Therefore every reported index is correct.

Hash collisions may create extra candidates, but verification prevents false matches.

Complexity

case	time
expected	$O(n + m)$
worst	$O(nm)$

The worst case occurs when many windows have the same hash as the pattern and require full verification.

Space

O(1)

The algorithm stores only hash values, constants, and the output list.

When to Use

Rabin Karp search is appropriate when:

rolling hash infrastructure already exists
many patterns are searched against the same text
substring candidates must be generated quickly
occasional hash collisions are acceptable because verification is used

For one exact pattern, Knuth Morris Pratt or Boyer Moore variants usually give stronger deterministic guarantees.

Implementation

def rabin_karp_search(text, pattern):
    n, m = len(text), len(pattern)
    if m > n:
        return []

    base = 256
    mod = 1_000_000_007

    def build_hash(s):
        h = 0
        for c in s:
            h = (h * base + ord(c)) % mod
        return h

    target = build_hash(pattern)
    current = build_hash(text[:m])
    power = pow(base, m - 1, mod)

    result = []

    for i in range(n - m + 1):
        if current == target and text[i:i + m] == pattern:
            result.append(i)

        if i < n - m:
            current = (current - ord(text[i]) * power) % mod
            current = (current * base + ord(text[i + m])) % mod

    return result

func RabinKarpSearch(text, pattern string) []int {
	n, m := len(text), len(pattern)
	if m > n {
		return nil
	}

	const base = 256
	const mod = 1000000007

	hashString := func(s string) int {
		h := 0
		for i := 0; i < len(s); i++ {
			h = (h*base + int(s[i])) % mod
		}
		return h
	}

	target := hashString(pattern)
	current := hashString(text[:m])

	power := 1
	for i := 0; i < m-1; i++ {
		power = (power * base) % mod
	}

	var result []int

	for i := 0; i <= n-m; i++ {
		if current == target && text[i:i+m] == pattern {
			result = append(result, i)
		}

		if i < n-m {
			current = (current - int(text[i])*power%mod + mod) % mod
			current = (current*base + int(text[i+m])) % mod
		}
	}

	return result
}

index	window	hash comparison	exact match
0	ban	no	no
1	ana	yes	yes
2	nan	no	no
3	ana	yes	yes
4	nab	no	no
5	aba	no	no
6	ban	no	no

index	window	hash comparison	exact match
0	ban	no	no
1	ana	yes	yes
2	nan	no	no
3	ana	yes	yes
4	nab	no	no
5	aba	no	no
6	ban	no	no

index	window	hash comparison	exact match
0	ban	no	no
1	ana	yes	yes
2	nan	no	no
3	ana	yes	yes
4	nab	no	no
5	aba	no	no
6	ban	no	no