16.6 Trie Matching
Suppose you need to search for many patterns simultaneously.
16.6 Trie Matching
Problem
Suppose you need to search for many patterns simultaneously.
Consider a dictionary:
he
her
hers
hero
help
hello
and a text:
hello hero and her helpers
A straightforward approach searches for each word independently.
search("he")
search("her")
search("hers")
search("hero")
search("help")
search("hello")
If there are thousands of patterns, the same text characters may be scanned repeatedly. The cost grows with both the text length and the number of patterns.
The challenge is to search for many strings at once while sharing as much work as possible.
Solution
A trie stores strings by sharing common prefixes.
Instead of storing:
he
her
hers
hero
help
hello
as six independent strings, a trie stores the common prefix:
he
only once.
Conceptually:
(root)
|
h
|
e
/|\\n r l *
/ \ \\ns o p
|
s
Each path from the root to a terminal node represents a word.
This organization transforms repeated prefix comparisons into a single traversal.
Understanding the Structure
Consider the words:
cat
car
care
cart
The trie becomes:
(root)
|
c
|
a
/ \\n t r
/ \\n e t
Notice that:
ca
appears only once.
The trie stores shared structure rather than duplicated characters.
This principle is responsible for both its efficiency and its memory characteristics.
Building a Trie
Each node contains:
- A collection of child edges.
- A marker indicating whether a word ends at that node.
A simple Lean representation might look like:
structure TrieNode where
terminal : Bool
children : Std.HashMap Char TrieNode
A trie itself is simply the root node.
abbrev Trie := TrieNode
The root does not correspond to a character. It serves as the entry point for every search.
Inserting a Word
To insert:
hero
start at the root.
Process characters one at a time:
h
e
r
o
Create missing nodes as needed.
Mark the final node as terminal.
After insertion:
(root)
|
h
|
e
|
r
|
o*
The asterisk indicates the end of a valid word.
Lean Implementation
A recursive insertion routine follows the path defined by the word.
partial def insert
(node : TrieNode)
(chars : List Char)
: TrieNode :=
match chars with
| [] =>
{ node with terminal := true }
| c :: rest =>
let child :=
node.children.findD c
{ terminal := false
children := {} }
let updated :=
insert child rest
{ node with
children :=
node.children.insert c updated }
The implementation is less important than the invariant:
Every word corresponds to exactly one root-to-terminal path.
Searching for a Word
Searching follows the same path construction.
To search for:
hero
traverse:
root
→ h
→ e
→ r
→ o
The search succeeds only if:
- Every edge exists.
- The final node is terminal.
This distinction matters.
Suppose the trie contains:
hero
Searching for:
her
reaches a node, but not necessarily a terminal node.
Therefore:
hero → present
her → absent
unless both words were inserted.
Prefix Queries
One of the most useful properties of a trie is efficient prefix search.
Suppose a user types:
hel
We traverse:
root
→ h
→ e
→ l
If that path exists, all descendants represent valid completions.
hel
├── hello
├── help
├── helper
└── helmet
This capability forms the foundation of:
- Search suggestions
- Code completion
- Dictionary lookup
- Predictive text systems
Matching Within Text
Now consider:
Text = "herohelpedher"
Starting at each position:
herohelpedher
^
herohelpedher
^
herohelpedher
^
we walk through the trie until no edge exists.
Whenever a terminal node is reached, a match is reported.
For example:
herohelpedher
^^^^
hero
and:
herohelpedher
^^^^
help
can be discovered during traversal.
The trie allows many patterns to be searched simultaneously because all words share a common structure.
Example
Dictionary:
he
her
hero
help
Text:
hero helped her
Matches:
he
her
hero
help
he
her
All are discovered using the same trie.
No separate search for each word is required.
Complexity Analysis
Let:
| Symbol | Meaning |
|---|---|
| n | Text length |
| m | Pattern length |
| k | Number of patterns |
Insert
O(m)
per word.
Lookup
O(m)
per query.
Prefix Search
O(m)
to reach the prefix node.
Storage
O(total characters)
across all patterns.
Because prefixes are shared, the actual memory consumption is often substantially smaller than storing every word independently.
Advantages
Tries excel when:
- Many strings share prefixes.
- Prefix queries are frequent.
- Autocomplete is required.
- Dictionary membership must be fast.
- Large vocabularies are searched repeatedly.
Examples include:
- Search engines
- IDE code completion
- Spell checkers
- Routing tables
- Natural language processing
Limitations
Tries also have tradeoffs.
A sparse trie may contain many nodes with few children.
For example:
aardvark
zebra
share almost nothing.
The resulting structure contains many partially populated nodes.
In memory-constrained environments, compressed variants are often preferable.
Common alternatives include:
- Radix trees
- Patricia tries
- Ternary search trees
- Finite-state automata
From Tries to Automata
A trie solves dictionary lookup and prefix matching elegantly, but matching every pattern at every text position can still be expensive.
The next step is to augment the trie with failure transitions that allow matching to continue without restarting from the root.
This idea leads directly to the Aho-Corasick algorithm, one of the most important multiple-pattern matching algorithms ever developed. It combines trie structure with automaton techniques to search for thousands of patterns simultaneously in linear time.