16.1 Exact Matching
Given a text string and a pattern string, determine whether the pattern occurs in the text and, if so, find all positions where the match begins.
16.1 Exact Matching
Problem
Given a text string and a pattern string, determine whether the pattern occurs in the text and, if so, find all positions where the match begins.
This is one of the oldest and most fundamental problems in computer science. Search engines look for query terms inside documents. Compilers search for keywords inside source code. Text editors perform find-and-replace operations. DNA analysis systems search for gene sequences within genomes. Log analysis tools look for signatures inside large event streams.
The exact matching problem can be stated formally:
Given:
Text = T[0..n-1]
Pattern = P[0..m-1]
Find every position i such that:
T[i + j] = P[j]
for every:
0 ≤ j < m
A valid match requires every character to be identical.
Solution
The simplest solution examines every possible starting position in the text and compares characters one by one.
Text: ABCABCABC
Pattern: ABC
Position 0 → Match
Position 1 → Fail
Position 2 → Fail
Position 3 → Match
Position 4 → Fail
Position 5 → Fail
Position 6 → Match
This approach is called naive exact matching.
Lean Implementation
def exactMatchAt (text : String) (pattern : String) (start : Nat) : Bool :=
if start + pattern.length > text.length then
false
else
let rec loop (i : Nat) :=
if h : i = pattern.length then
true
else
let idx := start + i
if text.get ⟨idx, by omega⟩ =
pattern.get ⟨i, by omega⟩ then
loop (i + 1)
else
false
loop 0
The function verifies whether a match exists beginning at a specific position.
A complete search iterates through every possible starting position.
def findMatches (text pattern : String) : List Nat :=
let rec search (pos : Nat) (acc : List Nat) :=
if pos + pattern.length > text.length then
acc.reverse
else
if exactMatchAt text pattern pos then
search (pos + 1) (pos :: acc)
else
search (pos + 1) acc
search 0 []
Example
Consider:
Text = BANANABANANA
Pattern = ANA
Possible starting positions:
| Position | Comparison | Result |
|---|---|---|
| 0 | BAN | Fail |
| 1 | ANA | Match |
| 2 | NAN | Fail |
| 3 | ANA | Match |
| 4 | NAB | Fail |
| 5 | ABA | Fail |
| 6 | BAN | Fail |
| 7 | ANA | Match |
| 8 | NAN | Fail |
| 9 | ANA | Match |
Output:
[1, 3, 7, 9]
Notice that matches may overlap.
BANANABANANA
ANA
ANA
ANA
ANA
Many practical systems must support overlapping matches.
Understanding the Search Space
Suppose:
Text length = n
Pattern length = m
There are:
n - m + 1
possible starting positions.
For each position we may compare up to:
m
characters.
Worst-case work becomes:
(n - m + 1) × m
which simplifies to:
O(nm)
This complexity becomes problematic when both strings are large.
Worst Case Example
Consider:
Text = AAAAAAAAAAAAAAAAA
Pattern = AAAAAAAAB
At each position:
AAAAAAAA
AAAAAAAB
The algorithm compares many characters before discovering failure at the last character.
Every shift repeats nearly the same work.
For text length:
n = 1,000,000
and pattern length:
m = 1,000
the algorithm may perform nearly one billion character comparisons.
This inefficiency motivates the more advanced algorithms presented later in the chapter.
Correctness
The algorithm examines every legal starting position.
For each position:
- Every pattern character is compared.
- Any mismatch rejects the position.
- If all characters match, the position is reported.
Since every candidate position is examined and every reported match satisfies the definition of exact matching, the algorithm finds all valid occurrences and produces no false matches.
Complexity Analysis
| Operation | Complexity |
|---|---|
| Single position check | O(m) |
| Full search | O(nm) |
| Extra memory | O(1) |
Where:
| Symbol | Meaning |
|---|---|
| n | Text length |
| m | Pattern length |
When the Naive Algorithm Is Enough
Despite its theoretical weakness, naive matching remains useful when:
- Patterns are very short.
- Texts are small.
- Searches occur infrequently.
- Simplicity matters more than maximum performance.
- Preprocessing cost would dominate execution time.
Many production systems still begin with a naive implementation because it is easy to verify, easy to maintain, and often surprisingly competitive on small inputs.
Recipe
Use naive exact matching when:
- You need a baseline implementation.
- Input sizes are modest.
- Correctness is more important than optimization.
- Pattern preprocessing is undesirable.
When repeated searches, large texts, or long patterns become common, switch to algorithms such as KMP, Z Algorithm, Rabin-Karp, suffix arrays, or Aho-Corasick, which avoid much of the redundant work performed by the naive approach.