16.1 Exact Matching

Given a text string and a pattern string, determine whether the pattern occurs in the text and, if so, find all positions where the match begins.

16.1 Exact Matching

Problem

Given a text string and a pattern string, determine whether the pattern occurs in the text and, if so, find all positions where the match begins.

This is one of the oldest and most fundamental problems in computer science. Search engines look for query terms inside documents. Compilers search for keywords inside source code. Text editors perform find-and-replace operations. DNA analysis systems search for gene sequences within genomes. Log analysis tools look for signatures inside large event streams.

The exact matching problem can be stated formally:

Given:

Text    = T[0..n-1]
Pattern = P[0..m-1]

Find every position i such that:

T[i + j] = P[j]

for every:

0 ≤ j < m

A valid match requires every character to be identical.

Solution

The simplest solution examines every possible starting position in the text and compares characters one by one.

Text:    ABCABCABC
Pattern: ABC

Position 0 → Match
Position 1 → Fail
Position 2 → Fail
Position 3 → Match
Position 4 → Fail
Position 5 → Fail
Position 6 → Match

This approach is called naive exact matching.

Lean Implementation

def exactMatchAt (text : String) (pattern : String) (start : Nat) : Bool :=
  if start + pattern.length > text.length then
    false
  else
    let rec loop (i : Nat) :=
      if h : i = pattern.length then
        true
      else
        let idx := start + i
        if text.get ⟨idx, by omega⟩ =
           pattern.get ⟨i, by omega⟩ then
          loop (i + 1)
        else
          false
    loop 0

The function verifies whether a match exists beginning at a specific position.

A complete search iterates through every possible starting position.

def findMatches (text pattern : String) : List Nat :=
  let rec search (pos : Nat) (acc : List Nat) :=
    if pos + pattern.length > text.length then
      acc.reverse
    else
      if exactMatchAt text pattern pos then
        search (pos + 1) (pos :: acc)
      else
        search (pos + 1) acc

  search 0 []

Example

Consider:

Text    = BANANABANANA
Pattern = ANA

Possible starting positions:

Position Comparison Result
0 BAN Fail
1 ANA Match
2 NAN Fail
3 ANA Match
4 NAB Fail
5 ABA Fail
6 BAN Fail
7 ANA Match
8 NAN Fail
9 ANA Match

Output:

[1, 3, 7, 9]

Notice that matches may overlap.

BANANABANANA
 ANA
   ANA
       ANA
         ANA

Many practical systems must support overlapping matches.

Understanding the Search Space

Suppose:

Text length    = n
Pattern length = m

There are:

n - m + 1

possible starting positions.

For each position we may compare up to:

m

characters.

Worst-case work becomes:

(n - m + 1) × m

which simplifies to:

O(nm)

This complexity becomes problematic when both strings are large.

Worst Case Example

Consider:

Text    = AAAAAAAAAAAAAAAAA
Pattern = AAAAAAAAB

At each position:

AAAAAAAA
AAAAAAAB

The algorithm compares many characters before discovering failure at the last character.

Every shift repeats nearly the same work.

For text length:

n = 1,000,000

and pattern length:

m = 1,000

the algorithm may perform nearly one billion character comparisons.

This inefficiency motivates the more advanced algorithms presented later in the chapter.

Correctness

The algorithm examines every legal starting position.

For each position:

  1. Every pattern character is compared.
  2. Any mismatch rejects the position.
  3. If all characters match, the position is reported.

Since every candidate position is examined and every reported match satisfies the definition of exact matching, the algorithm finds all valid occurrences and produces no false matches.

Complexity Analysis

Operation Complexity
Single position check O(m)
Full search O(nm)
Extra memory O(1)

Where:

Symbol Meaning
n Text length
m Pattern length

When the Naive Algorithm Is Enough

Despite its theoretical weakness, naive matching remains useful when:

  • Patterns are very short.
  • Texts are small.
  • Searches occur infrequently.
  • Simplicity matters more than maximum performance.
  • Preprocessing cost would dominate execution time.

Many production systems still begin with a naive implementation because it is easy to verify, easy to maintain, and often surprisingly competitive on small inputs.

Recipe

Use naive exact matching when:

  1. You need a baseline implementation.
  2. Input sizes are modest.
  3. Correctness is more important than optimization.
  4. Pattern preprocessing is undesirable.

When repeated searches, large texts, or long patterns become common, switch to algorithms such as KMP, Z Algorithm, Rabin-Karp, suffix arrays, or Aho-Corasick, which avoid much of the redundant work performed by the naive approach.