# Rabin Karp Search

# Rabin Karp Search

Rabin Karp search is a string matching algorithm that uses a rolling hash to find candidate matches of a pattern in a text. It compares hash values first, then verifies equal-hash windows by direct character comparison.

This separates fast candidate detection from exact verification. It is especially useful when searching for many patterns or when rolling hashes are reused in a larger text processing pipeline.

## Problem

Given a text $T$ of length $n$ and a pattern $P$ of length $m$, find all indices $i$ such that:

$$
T[i \dots i+m-1] = P
$$

## Model

Use a polynomial rolling hash:

$$
H(s_0s_1\dots s_{m-1}) =
\left(
s_0B^{m-1} + s_1B^{m-2} + \dots + s_{m-1}
\right) \bmod M
$$

where:

* $B$ is the base
* $M$ is the modulus
* characters are treated as integer codes

The next window hash is updated from the previous one in constant time:

$$
H_{i+1} =
\left(
(H_i - T[i]B^{m-1})B + T[i+m]
\right) \bmod M
$$

## Algorithm

```text id="z5ypnb"
rabin_karp_search(T, P):
    n = length(T)
    m = length(P)

    if m > n:
        return empty list

    target = hash(P)
    current = hash(T[0..m-1])

    for i from 0 to n - m:
        if current == target:
            if T[i..i+m-1] == P:
                report i

        if i < n - m:
            current = roll(current, T[i], T[i+m])

    return reported indices
```

## Example

Let:

$$
T = "bananaban"
$$

and:

$$
P = "ana"
$$

The algorithm checks all length 3 windows:

| index | window | hash comparison | exact match |
| ----: | ------ | --------------- | ----------- |
|     0 | ban    | no              | no          |
|     1 | ana    | yes             | yes         |
|     2 | nan    | no              | no          |
|     3 | ana    | yes             | yes         |
|     4 | nab    | no              | no          |
|     5 | aba    | no              | no          |
|     6 | ban    | no              | no          |

The matches are:

$$
[1, 3]
$$

## Correctness

Rabin Karp examines every substring of length $m$ in the text. If a substring equals the pattern, then its hash equals the pattern hash, so it becomes a candidate and is verified by direct comparison.

If the algorithm reports an index, it has checked that the corresponding substring equals the pattern. Therefore every reported index is correct.

Hash collisions may create extra candidates, but verification prevents false matches.

## Complexity

| case     | time       |
| -------- | ---------- |
| expected | $O(n + m)$ |
| worst    | $O(nm)$    |

The worst case occurs when many windows have the same hash as the pattern and require full verification.

## Space

$$
O(1)
$$

The algorithm stores only hash values, constants, and the output list.

## When to Use

Rabin Karp search is appropriate when:

* rolling hash infrastructure already exists
* many patterns are searched against the same text
* substring candidates must be generated quickly
* occasional hash collisions are acceptable because verification is used

For one exact pattern, Knuth Morris Pratt or Boyer Moore variants usually give stronger deterministic guarantees.

## Implementation

```python id="ptxyv9"
def rabin_karp_search(text, pattern):
    n, m = len(text), len(pattern)
    if m > n:
        return []

    base = 256
    mod = 1_000_000_007

    def build_hash(s):
        h = 0
        for c in s:
            h = (h * base + ord(c)) % mod
        return h

    target = build_hash(pattern)
    current = build_hash(text[:m])
    power = pow(base, m - 1, mod)

    result = []

    for i in range(n - m + 1):
        if current == target and text[i:i + m] == pattern:
            result.append(i)

        if i < n - m:
            current = (current - ord(text[i]) * power) % mod
            current = (current * base + ord(text[i + m])) % mod

    return result
```

```go id="jvf9tr"
func RabinKarpSearch(text, pattern string) []int {
	n, m := len(text), len(pattern)
	if m > n {
		return nil
	}

	const base = 256
	const mod = 1000000007

	hashString := func(s string) int {
		h := 0
		for i := 0; i < len(s); i++ {
			h = (h*base + int(s[i])) % mod
		}
		return h
	}

	target := hashString(pattern)
	current := hashString(text[:m])

	power := 1
	for i := 0; i < m-1; i++ {
		power = (power * base) % mod
	}

	var result []int

	for i := 0; i <= n-m; i++ {
		if current == target && text[i:i+m] == pattern {
			result = append(result, i)
		}

		if i < n-m {
			current = (current - int(text[i])*power%mod + mod) % mod
			current = (current*base + int(text[i+m])) % mod
		}
	}

	return result
}
```

