16.3 Z Algorithm

Many string algorithms need to answer the question: > How many characters match between a string prefix and a substring starting at a particular position?

16.3 Z Algorithm

Problem

Many string algorithms need to answer the question:

How many characters match between a string prefix and a substring starting at a particular position?

Consider:

ABABABCA

At position 2:

ABABABCA
  ABABCA

The substring beginning at position 2 shares four characters with the prefix:

ABAB

A naive solution compares characters one by one for every position.

For a string of length n, this may require:

O(n²)

comparisons.

The challenge is to compute all prefix matches in linear time.

Solution

The Z Algorithm computes an array:

Z[i]

where:

Z[i] = length of the longest substring
       starting at i that matches
       the prefix of the string

The resulting array captures significant structural information about the string and can be used for:

  • Exact pattern matching
  • Repeated substring detection
  • String compression
  • Prefix analysis
  • Periodicity detection
  • Border computation

The algorithm runs in:

O(n)

time.

Understanding the Z Array

Consider:

S = AABCAABXAAY

Indexing:

Index Character
0 A
1 A
2 B
3 C
4 A
5 A
6 B
7 X
8 A
9 A
10 Y

The Z values are:

Index Z
0 0
1 1
2 0
3 0
4 3
5 1
6 0
7 0
8 2
9 1
10 0

At position 4:

AABCAABXAAY
AAB

Three characters match the prefix.

Therefore:

Z[4] = 3

The Z Box

The key optimization is maintaining a region:

[L, R]

called the Z-box.

Inside this interval:

S[L..R]

matches:

S[0..R-L]

This means previous work can be reused.

Example:

A B A B A B C A
    |-----|
      Z-box

Any position inside the box inherits information already computed.

Instead of starting comparisons from scratch, the algorithm copies information from a corresponding position in the prefix.

This avoids redundant character comparisons.

Naive Construction

A direct implementation:

def zValue
  (s : Array Char)
  (start : Nat)
  : Nat :=

  let rec loop (k : Nat) :=
    if start + k < s.size
      && s[k]! = s[start + k]! then
      loop (k + 1)
    else
      k

  loop 0

Applying this at every position yields:

O(n²)

time.

The real Z Algorithm eliminates this inefficiency.

Linear-Time Construction

The algorithm keeps track of:

L = left boundary
R = right boundary

of the current Z-box.

For each position:

Case 1: Outside the Z-box

i > R

No previous information exists.

Perform direct comparison.

AABCAABXAAY
    ^

Expand character by character.

Case 2: Inside the Z-box

i ≤ R

Reuse previously computed information.

Suppose:

k = i - L

Then:

Z[k]

already contains information about matching prefixes.

Two subcases arise:

Fully Contained

Z[k] < R - i + 1

Then:

Z[i] = Z[k]

No further work required.

Touching Boundary

Z[k] ≥ R - i + 1

The match may continue beyond R.

Start expansion from the boundary.

Current box
[L----------R]
          ^
          i

Only new characters are compared.

Lean Implementation

def buildZ
  (s : Array Char)
  : Array Nat :=

  Id.run do
    let n := s.size

    let mut z := Array.mkArray n 0
    let mut l := 0
    let mut r := 0

    for i in [1:n] do

      if i <= r then
        let k := i - l
        let value :=
          Nat.min z[k]! (r - i + 1)
        z := z.set! i value

      let mut current := z[i]!

      while i + current < n
        && s[current]! =
           s[i + current]! do
        current := current + 1

      z := z.set! i current

      if i + current - 1 > r then
        l := i
        r := i + current - 1

    return z

Every character contributes to at most a constant amount of expansion work.

Therefore total runtime remains linear.

Pattern Matching with Z

The Z Algorithm becomes a pattern matcher through a simple transformation.

Construct:

Pattern + "$" + Text

Example:

Pattern = ABA
Text    = ABABABA

Combined string:

ABA$ABABABA

Compute the Z array.

Whenever:

Z[i] = Pattern.length

a complete match exists.

Example

Pattern:

ABA

Text:

ABABABA

Combined:

ABA$ABABABA

Z values:

Position Z
0 0
1 0
2 1
3 0
4 3
5 0
6 3
7 0
8 3
9 0
10 1

Matches occur where:

Z[i] = 3

Positions:

0
2
4

inside the original text.

Detecting Repeated Patterns

The Z array naturally reveals repetition.

Example:

ABABABAB

Z values:

Position Z
0 0
1 0
2 6
3 0
4 4
5 0
6 2
7 0

Large values indicate repeated prefix structures.

This property is useful for:

  • Compression
  • DNA analysis
  • Pattern mining
  • Period detection

Correctness

The algorithm maintains the invariant:

S[L..R]

matches:

S[0..R-L]

at every step.

For positions inside the current Z-box, previously computed information remains valid because the matched regions are identical.

Whenever the copied information reaches the boundary, direct expansion verifies whether the match extends further.

Therefore every Z value equals the exact length of the longest prefix match beginning at that position.

Complexity Analysis

Operation Complexity
Z construction O(n)
Pattern matching O(n + m)
Extra memory O(n)

Where:

Symbol Meaning
n Text length
m Pattern length

KMP vs Z Algorithm

Feature KMP Z Algorithm
Preprocessing target Pattern Entire string
Matching complexity O(n + m) O(n + m)
Failure table LPS Z-array
Pattern matching Excellent Excellent
Prefix analysis Moderate Excellent
Repetition detection Moderate Excellent
Conceptual simplicity Moderate High

Both algorithms achieve linear performance.

KMP views matching from the perspective of the pattern and its prefixes.

The Z Algorithm views matching from the perspective of every position in the string and how strongly it resembles the prefix.

This prefix-centric perspective makes the Z Algorithm one of the most elegant and versatile tools in string processing.