Skip to content

Suffix Tree Search

Search for a pattern in a text using a suffix tree in time proportional to the pattern length.

Suffix tree search finds whether a pattern occurs in a text by traversing a compressed trie of all suffixes of the text.

A suffix tree stores every suffix of a string. Each edge represents a substring of the original text. This allows pattern matching in time proportional to the pattern length, independent of the text size.

Problem

Given a suffix tree built from a text TT and a pattern string PP, determine whether PP occurs in TT.

Return true if PP is a substring of TT, otherwise false.

Structure

A suffix tree is a compressed trie of all suffixes of TT.

Each edge stores:

fieldmeaning
startstart index in text
endend index in text
childnext node

Leaves represent suffixes of TT.

Algorithm

Start at the root and match the pattern against edge labels. Each edge label corresponds to a substring of the text.

suffix_tree_search(root, T, P):
    node = root
    i = 0

    while i < length(P):
        c = P[i]

        if c not in node.edges:
            return false

        edge = node.edges[c]
        j = edge.start

        while j <= edge.end and i < length(P):
            if T[j] != P[i]:
                return false
            j = j + 1
            i = i + 1

        node = edge.child

    return true

Example

Text:

T="banana" T = "banana"

Suffixes:

suffix
banana
anana
nana
ana
na
a

Search for pattern "ana":

steppattern partaction
1anamatch edge starting with a
2nacontinue along edge
end-pattern consumed

Since the pattern is fully matched, it exists in the text.

Correctness

The suffix tree contains all suffixes of the text. Therefore every substring of the text corresponds to a prefix of some suffix.

Searching for a pattern is equivalent to checking whether the pattern matches a path from the root. If the traversal consumes all characters of the pattern, then the pattern is a prefix of some suffix, hence a substring of the text.

If at any point no matching edge exists or characters mismatch, then no suffix has that prefix, so the pattern does not occur.

Complexity

Let m=Pm = |P|.

operationtime
searchO(m)O(m)

The algorithm compares each character of the pattern at most once.

Space complexity:

Suffix tree size is:

O(n) O(n)

where n=Tn = |T|.

Search uses:

O(1) O(1)

additional space.

When to Use

Suffix tree search is useful when:

  • many substring queries are performed on the same text
  • preprocessing cost is acceptable
  • linear time pattern matching is required

It is commonly used in string matching, bioinformatics, and text indexing.

Implementation

class Edge:
    def __init__(self, start, end, child):
        self.start = start
        self.end = end
        self.child = child

class Node:
    def __init__(self):
        self.edges = {}

def suffix_tree_search(root, text, pattern):
    node = root
    i = 0

    while i < len(pattern):
        c = pattern[i]

        if c not in node.edges:
            return False

        edge = node.edges[c]
        j = edge.start

        while j <= edge.end and i < len(pattern):
            if text[j] != pattern[i]:
                return False
            j += 1
            i += 1

        node = edge.child

    return True
type Edge struct {
	Start int
	End   int
	Child *Node
}

type Node struct {
	Edges map[rune]*Edge
}

func SuffixTreeSearch(root *Node, text string, pattern string) bool {
	node := root
	i := 0

	for i < len(pattern) {
		c := rune(pattern[i])

		edge, ok := node.Edges[c]
		if !ok {
			return false
		}

		j := edge.Start

		for j <= edge.End && i < len(pattern) {
			if text[j] != pattern[i] {
				return false
			}
			j++
			i++
		}

		node = edge.Child
	}

	return true
}