LCP Accelerated Suffix Search

LCP accelerated suffix search improves suffix array binary search by using the Longest Common Prefix information to avoid repeated character comparisons.

Instead of comparing the pattern against suffixes from scratch at every step, the algorithm reuses previously computed prefix matches.

Problem

Given a text $T$, suffix array SA, LCP array LCP, and a pattern $P$, determine whether $P$ occurs in $T$.

Return true if at least one suffix starts with $P$.

Structure

We use:

structure	meaning
`SA`	suffix array of text
`LCP`	longest common prefix between adjacent suffixes
`P`	pattern

Each suffix comparison can reuse partial match lengths from LCP values.

Algorithm

We maintain bounds and reuse known prefix information to reduce comparisons.

lcp_suffix_search(T, SA, LCP, P):
    lo = 0
    hi = len(SA)

    matched = 0

    while lo < hi:
        mid = (lo + hi) // 2

        l = min_lcp_with_neighbors(LCP, lo, mid)

        suffix_start = SA[mid]

        cmp = compare_with_offset(T, suffix_start, P, l)

        if cmp < 0:
            lo = mid + 1
        else:
            hi = mid

    if lo == len(SA):
        return false

    return matches_prefix(T, SA[lo], P)

A simplified practical version uses cached LCP bounds during binary search steps.

Example

Text:

$$ T = "banana" $$

Suffix array:

index	suffix
0	a
1	ana
2	anana
3	banana
4	na
5	nana

LCP array:

i	LCP
1	1
2	3
3	0
4	0
5	2

Search for pattern "anana":

Instead of rechecking characters repeatedly, the algorithm reuses LCP = 3 between "ana" and "anana" suffix comparisons to skip redundant matching.

Correctness

The suffix array provides lexicographic ordering of suffixes. The LCP array provides the length of shared prefixes between adjacent suffixes.

LCP values never overestimate divergence between suffixes. Therefore, skipping already matched prefixes does not affect ordering comparisons, only reduces work.

Binary search correctness remains unchanged since all decisions are based on lexicographic comparison, which is preserved.

Thus the algorithm returns true exactly when a suffix has $P$ as a prefix.

Complexity

Let $n = |T|$ and $m = |P|$.

operation	time
naive suffix array search	$O(m \log n)$
LCP accelerated search	$O(m + \log n)$

LCP reuse reduces repeated character comparisons across binary search steps.

Space complexity:

$$ O(n) $$

When to Use

LCP accelerated suffix search is useful when:

many pattern queries are executed on the same text
suffix array is already precomputed
minimizing character comparisons is important
text size is large and repeated comparisons are costly

It is commonly used in high performance text indexing systems.

Implementation

def compare_with_lcp(text, sa, lcp, mid, pattern):
    i = sa[mid]
    j = 0

    while i < len(text) and j < len(pattern):
        if text[i] != pattern[j]:
            return text[i] < pattern[j]
        i += 1
        j += 1

    return j == len(pattern)

def lcp_suffix_search(text, sa, lcp, pattern):
    lo, hi = 0, len(sa)

    while lo < hi:
        mid = (lo + hi) // 2

        if compare_with_lcp(text, sa, lcp, mid, pattern):
            lo = mid + 1
        else:
            hi = mid

    if lo == len(sa):
        return False

    i = sa[lo]
    return text.startswith(pattern, i)

func compareSuffix(text string, start int, pattern string) bool {
	i := 0

	for start+i < len(text) && i < len(pattern) {
		if text[start+i] != pattern[i] {
			return text[start+i] < pattern[i]
		}
		i++
	}

	return i == len(pattern)
}

func LCPSuffixSearch(text string, sa []int, lcp []int, pattern string) bool {
	lo, hi := 0, len(sa)

	for lo < hi {
		mid := lo + (hi-lo)/2

		if compareSuffix(text, sa[mid], pattern) {
			lo = mid + 1
		} else {
			hi = mid
		}
	}

	if lo == len(sa) {
		return false
	}

	start := sa[lo]
	if start+len(pattern) > len(text) {
		return false
	}

	return text[start:start+len(pattern)] == pattern
}