Search for a pattern in a suffix array using LCP information to reduce redundant character comparisons.
LCP accelerated suffix search improves suffix array binary search by using the Longest Common Prefix information to avoid repeated character comparisons.
Instead of comparing the pattern against suffixes from scratch at every step, the algorithm reuses previously computed prefix matches.
Problem
Given a text , suffix array SA, LCP array LCP, and a pattern , determine whether occurs in .
Return true if at least one suffix starts with .
Structure
We use:
| structure | meaning |
|---|---|
SA | suffix array of text |
LCP | longest common prefix between adjacent suffixes |
P | pattern |
Each suffix comparison can reuse partial match lengths from LCP values.
Algorithm
We maintain bounds and reuse known prefix information to reduce comparisons.
lcp_suffix_search(T, SA, LCP, P):
lo = 0
hi = len(SA)
matched = 0
while lo < hi:
mid = (lo + hi) // 2
l = min_lcp_with_neighbors(LCP, lo, mid)
suffix_start = SA[mid]
cmp = compare_with_offset(T, suffix_start, P, l)
if cmp < 0:
lo = mid + 1
else:
hi = mid
if lo == len(SA):
return false
return matches_prefix(T, SA[lo], P)A simplified practical version uses cached LCP bounds during binary search steps.
Example
Text:
Suffix array:
| index | suffix |
|---|---|
| 0 | a |
| 1 | ana |
| 2 | anana |
| 3 | banana |
| 4 | na |
| 5 | nana |
LCP array:
| i | LCP |
|---|---|
| 1 | 1 |
| 2 | 3 |
| 3 | 0 |
| 4 | 0 |
| 5 | 2 |
Search for pattern "anana":
Instead of rechecking characters repeatedly, the algorithm reuses LCP = 3 between "ana" and "anana" suffix comparisons to skip redundant matching.
Correctness
The suffix array provides lexicographic ordering of suffixes. The LCP array provides the length of shared prefixes between adjacent suffixes.
LCP values never overestimate divergence between suffixes. Therefore, skipping already matched prefixes does not affect ordering comparisons, only reduces work.
Binary search correctness remains unchanged since all decisions are based on lexicographic comparison, which is preserved.
Thus the algorithm returns true exactly when a suffix has as a prefix.
Complexity
Let and .
| operation | time |
|---|---|
| naive suffix array search | |
| LCP accelerated search |
LCP reuse reduces repeated character comparisons across binary search steps.
Space complexity:
When to Use
LCP accelerated suffix search is useful when:
- many pattern queries are executed on the same text
- suffix array is already precomputed
- minimizing character comparisons is important
- text size is large and repeated comparisons are costly
It is commonly used in high performance text indexing systems.
Implementation
def compare_with_lcp(text, sa, lcp, mid, pattern):
i = sa[mid]
j = 0
while i < len(text) and j < len(pattern):
if text[i] != pattern[j]:
return text[i] < pattern[j]
i += 1
j += 1
return j == len(pattern)
def lcp_suffix_search(text, sa, lcp, pattern):
lo, hi = 0, len(sa)
while lo < hi:
mid = (lo + hi) // 2
if compare_with_lcp(text, sa, lcp, mid, pattern):
lo = mid + 1
else:
hi = mid
if lo == len(sa):
return False
i = sa[lo]
return text.startswith(pattern, i)func compareSuffix(text string, start int, pattern string) bool {
i := 0
for start+i < len(text) && i < len(pattern) {
if text[start+i] != pattern[i] {
return text[start+i] < pattern[i]
}
i++
}
return i == len(pattern)
}
func LCPSuffixSearch(text string, sa []int, lcp []int, pattern string) bool {
lo, hi := 0, len(sa)
for lo < hi {
mid := lo + (hi-lo)/2
if compareSuffix(text, sa[mid], pattern) {
lo = mid + 1
} else {
hi = mid
}
}
if lo == len(sa) {
return false
}
start := sa[lo]
if start+len(pattern) > len(text) {
return false
}
return text[start:start+len(pattern)] == pattern
}