Skip to content

LCP Accelerated Suffix Search

Search for a pattern in a suffix array using LCP information to reduce redundant character comparisons.

LCP accelerated suffix search improves suffix array binary search by using the Longest Common Prefix information to avoid repeated character comparisons.

Instead of comparing the pattern against suffixes from scratch at every step, the algorithm reuses previously computed prefix matches.

Problem

Given a text TT, suffix array SA, LCP array LCP, and a pattern PP, determine whether PP occurs in TT.

Return true if at least one suffix starts with PP.

Structure

We use:

structuremeaning
SAsuffix array of text
LCPlongest common prefix between adjacent suffixes
Ppattern

Each suffix comparison can reuse partial match lengths from LCP values.

Algorithm

We maintain bounds and reuse known prefix information to reduce comparisons.

lcp_suffix_search(T, SA, LCP, P):
    lo = 0
    hi = len(SA)

    matched = 0

    while lo < hi:
        mid = (lo + hi) // 2

        l = min_lcp_with_neighbors(LCP, lo, mid)

        suffix_start = SA[mid]

        cmp = compare_with_offset(T, suffix_start, P, l)

        if cmp < 0:
            lo = mid + 1
        else:
            hi = mid

    if lo == len(SA):
        return false

    return matches_prefix(T, SA[lo], P)

A simplified practical version uses cached LCP bounds during binary search steps.

Example

Text:

T="banana" T = "banana"

Suffix array:

indexsuffix
0a
1ana
2anana
3banana
4na
5nana

LCP array:

iLCP
11
23
30
40
52

Search for pattern "anana":

Instead of rechecking characters repeatedly, the algorithm reuses LCP = 3 between "ana" and "anana" suffix comparisons to skip redundant matching.

Correctness

The suffix array provides lexicographic ordering of suffixes. The LCP array provides the length of shared prefixes between adjacent suffixes.

LCP values never overestimate divergence between suffixes. Therefore, skipping already matched prefixes does not affect ordering comparisons, only reduces work.

Binary search correctness remains unchanged since all decisions are based on lexicographic comparison, which is preserved.

Thus the algorithm returns true exactly when a suffix has PP as a prefix.

Complexity

Let n=Tn = |T| and m=Pm = |P|.

operationtime
naive suffix array searchO(mlogn)O(m \log n)
LCP accelerated searchO(m+logn)O(m + \log n)

LCP reuse reduces repeated character comparisons across binary search steps.

Space complexity:

O(n) O(n)

When to Use

LCP accelerated suffix search is useful when:

  • many pattern queries are executed on the same text
  • suffix array is already precomputed
  • minimizing character comparisons is important
  • text size is large and repeated comparisons are costly

It is commonly used in high performance text indexing systems.

Implementation

def compare_with_lcp(text, sa, lcp, mid, pattern):
    i = sa[mid]
    j = 0

    while i < len(text) and j < len(pattern):
        if text[i] != pattern[j]:
            return text[i] < pattern[j]
        i += 1
        j += 1

    return j == len(pattern)

def lcp_suffix_search(text, sa, lcp, pattern):
    lo, hi = 0, len(sa)

    while lo < hi:
        mid = (lo + hi) // 2

        if compare_with_lcp(text, sa, lcp, mid, pattern):
            lo = mid + 1
        else:
            hi = mid

    if lo == len(sa):
        return False

    i = sa[lo]
    return text.startswith(pattern, i)
func compareSuffix(text string, start int, pattern string) bool {
	i := 0

	for start+i < len(text) && i < len(pattern) {
		if text[start+i] != pattern[i] {
			return text[start+i] < pattern[i]
		}
		i++
	}

	return i == len(pattern)
}

func LCPSuffixSearch(text string, sa []int, lcp []int, pattern string) bool {
	lo, hi := 0, len(sa)

	for lo < hi {
		mid := lo + (hi-lo)/2

		if compareSuffix(text, sa[mid], pattern) {
			lo = mid + 1
		} else {
			hi = mid
		}
	}

	if lo == len(sa) {
		return false
	}

	start := sa[lo]
	if start+len(pattern) > len(text) {
		return false
	}

	return text[start:start+len(pattern)] == pattern
}