Search for a pattern in a text using a suffix tree in time proportional to the pattern length.
Suffix tree search finds whether a pattern occurs in a text by traversing a compressed trie of all suffixes of the text.
A suffix tree stores every suffix of a string. Each edge represents a substring of the original text. This allows pattern matching in time proportional to the pattern length, independent of the text size.
Problem
Given a suffix tree built from a text and a pattern string , determine whether occurs in .
Return true if is a substring of , otherwise false.
Structure
A suffix tree is a compressed trie of all suffixes of .
Each edge stores:
| field | meaning |
|---|---|
start | start index in text |
end | end index in text |
child | next node |
Leaves represent suffixes of .
Algorithm
Start at the root and match the pattern against edge labels. Each edge label corresponds to a substring of the text.
suffix_tree_search(root, T, P):
node = root
i = 0
while i < length(P):
c = P[i]
if c not in node.edges:
return false
edge = node.edges[c]
j = edge.start
while j <= edge.end and i < length(P):
if T[j] != P[i]:
return false
j = j + 1
i = i + 1
node = edge.child
return trueExample
Text:
Suffixes:
| suffix |
|---|
| banana |
| anana |
| nana |
| ana |
| na |
| a |
Search for pattern "ana":
| step | pattern part | action |
|---|---|---|
| 1 | ana | match edge starting with a |
| 2 | na | continue along edge |
| end | - | pattern consumed |
Since the pattern is fully matched, it exists in the text.
Correctness
The suffix tree contains all suffixes of the text. Therefore every substring of the text corresponds to a prefix of some suffix.
Searching for a pattern is equivalent to checking whether the pattern matches a path from the root. If the traversal consumes all characters of the pattern, then the pattern is a prefix of some suffix, hence a substring of the text.
If at any point no matching edge exists or characters mismatch, then no suffix has that prefix, so the pattern does not occur.
Complexity
Let .
| operation | time |
|---|---|
| search |
The algorithm compares each character of the pattern at most once.
Space complexity:
Suffix tree size is:
where .
Search uses:
additional space.
When to Use
Suffix tree search is useful when:
- many substring queries are performed on the same text
- preprocessing cost is acceptable
- linear time pattern matching is required
It is commonly used in string matching, bioinformatics, and text indexing.
Implementation
class Edge:
def __init__(self, start, end, child):
self.start = start
self.end = end
self.child = child
class Node:
def __init__(self):
self.edges = {}
def suffix_tree_search(root, text, pattern):
node = root
i = 0
while i < len(pattern):
c = pattern[i]
if c not in node.edges:
return False
edge = node.edges[c]
j = edge.start
while j <= edge.end and i < len(pattern):
if text[j] != pattern[i]:
return False
j += 1
i += 1
node = edge.child
return Truetype Edge struct {
Start int
End int
Child *Node
}
type Node struct {
Edges map[rune]*Edge
}
func SuffixTreeSearch(root *Node, text string, pattern string) bool {
node := root
i := 0
for i < len(pattern) {
c := rune(pattern[i])
edge, ok := node.Edges[c]
if !ok {
return false
}
j := edge.Start
for j <= edge.End && i < len(pattern) {
if text[j] != pattern[i] {
return false
}
j++
i++
}
node = edge.Child
}
return true
}