8.12 Tries
Many search problems involve prefixes.
8.12 Tries
Many search problems involve prefixes.
Examples include:
- Autocomplete systems
- Spell checkers
- Search suggestions
- Dictionary lookups
- URL routing
- IP routing tables
- DNA sequence matching
A straightforward solution stores strings in a hash table and performs exact lookups. This works well when the entire key is known.
However, prefix-based operations are different.
Suppose a dictionary contains:
apple
application
apply
apt
banana
Given the prefix:
app
we want:
apple
application
apply
Hash tables do not naturally support this operation.
A Trie is a tree specifically designed for prefix queries.
Instead of storing complete strings in a single location, a Trie stores characters along paths. Strings with common prefixes share nodes, making prefix operations efficient and intuitive.
Problem
You need fast prefix lookups, prefix counting, or autocomplete operations on large collections of strings.
Solution
Represent each character as an edge in a tree.
Consider these words:
cat
car
care
dog
The Trie looks like:
root
├── c
│ └── a
│ ├── t
│ └── r
│ └── e
└── d
└── o
└── g
The prefix:
ca
appears only once.
The strings:
cat
car
care
share the same initial path.
A simple implementation:
package main
type TrieNode struct {
Children map[rune]*TrieNode
End bool
}
func NewNode() *TrieNode {
return &TrieNode{
Children: make(map[rune]*TrieNode),
}
}
The root node:
root := NewNode()
contains no character itself.
Characters live on edges between nodes.
Discussion
A Trie can be viewed as a tree representation of a set of strings.
Instead of:
"apple"
being stored as one object, it becomes:
a → p → p → l → e
Every node represents a prefix.
For example:
app
corresponds to an actual node inside the structure.
This property makes prefix operations efficient.
Inserting Words
Insertion walks the path and creates missing nodes.
func Insert(
root *TrieNode,
word string,
) {
current := root
for _, ch := range word {
child, exists :=
current.Children[ch]
if !exists {
child = NewNode()
current.Children[ch] = child
}
current = child
}
current.End = true
}
Usage:
Insert(root, "cat")
Insert(root, "car")
Insert(root, "care")
The shared prefix:
ca
is stored once.
Only the diverging suffixes require additional nodes.
Exact Search
Searching follows the same path.
func Search(
root *TrieNode,
word string,
) bool {
current := root
for _, ch := range word {
child, exists :=
current.Children[ch]
if !exists {
return false
}
current = child
}
return current.End
}
Examples:
Search(root, "cat")
returns:
true
while:
Search(root, "cab")
returns:
false
The traversal follows characters directly.
No subtree exploration is required.
Prefix Search
A major advantage of Tries is prefix lookup.
func HasPrefix(
root *TrieNode,
prefix string,
) bool {
current := root
for _, ch := range prefix {
child, exists :=
current.Children[ch]
if !exists {
return false
}
current = child
}
return true
}
Example:
HasPrefix(root, "ca")
returns:
true
because:
cat
car
care
all begin with:
ca
This operation requires only traversal of the prefix itself.
Autocomplete
Once the prefix node is located, every descendant represents a valid completion.
Suppose:
app
points to:
app
├── l
│ ├── e
│ └── y
└── lication
The completions are:
apple
apply
application
Autocomplete therefore becomes:
- Find prefix node.
- DFS through descendants.
- Collect words.
The tree structure makes this process natural.
Counting Prefixes
Many applications need counts.
For example:
How many words begin with "app"?
Store a counter in every node.
type TrieNode struct {
Children map[rune]*TrieNode
End bool
Count int
}
During insertion:
current.Count++
Now:
prefix count
becomes:
O(length of prefix)
instead of scanning all words.
Fixed Alphabet Tries
Maps are flexible but have overhead.
For lowercase English letters:
type TrieNode struct {
Children [26]*TrieNode
End bool
}
Index calculation:
index := ch - 'a'
Advantages:
- Faster access
- Better cache locality
- Less allocation overhead
Disadvantages:
- Wastes memory when branching is sparse
This tradeoff appears frequently in systems programming.
Memory Considerations
Tries trade memory for speed.
Consider:
apple
banana
orange
A hash table stores:
3 strings
A Trie stores:
many nodes
Each node contains child references.
Large dictionaries may require millions of nodes.
The memory cost is often the primary limitation of Trie-based systems.
Compression techniques exist to reduce this overhead.
Compressed Tries
A standard Trie may contain long chains.
Example:
root
|
a
|
p
|
p
|
l
|
e
Many nodes have only one child.
A compressed Trie merges these chains.
root
|
apple
This significantly reduces memory usage.
Compressed Tries appear in:
- Search engines
- Routing tables
- Large dictionaries
The idea remains the same.
Only the representation changes.
Tries and Routing
Internet routers perform longest-prefix matching.
Example:
192.168.0.0/16
192.168.1.0/24
192.168.1.128/25
A binary Trie stores bits instead of characters.
0
1
0
1
...
The longest matching path determines routing behavior.
This application demonstrates that Tries are not limited to text.
Any sequence can be stored.
Tries and Dynamic Programming
Many string algorithms combine Tries with DFS or dynamic programming.
Examples:
- Word Break
- Dictionary segmentation
- Multi-pattern matching
- Text correction
The Trie provides fast prefix discovery.
Dynamic programming handles combinatorial choices.
Together they form powerful text-processing systems.
Comparing Tries and Hash Tables
| Operation | Hash Table | Trie |
|---|---|---|
| Exact search | O(L) | O(L) |
| Insert | O(L) | O(L) |
| Prefix search | O(N) | O(L) |
| Autocomplete | Poor | Excellent |
| Memory usage | Lower | Higher |
Here:
L= string lengthN= number of stored words
Hash tables excel at exact lookup.
Tries excel at prefix operations.
Complexity Analysis
Let:
L= string length
Insertion:
| Operation | Complexity |
|---|---|
| Insert | O(L) |
| Search | O(L) |
| Prefix lookup | O(L) |
Autocomplete:
| Operation | Complexity |
|---|---|
| Locate prefix | O(L) |
| Enumerate results | O(output size) |
Memory:
| Structure | Complexity |
|---|---|
| Trie | O(total characters) |
The exact memory cost depends on branching representation.
Common Mistakes
A common mistake is forgetting the distinction between:
Path exists
and:
Word exists
Suppose only:
care
is inserted.
The path:
car
exists.
But:
Search("car")
should return:
false
unless the terminal marker is set.
Another mistake is using maps when performance is critical and the alphabet is fixed. Arrays often provide substantially better throughput.
A third mistake is storing very large alphabets directly in arrays. Unicode contains many possible characters, making sparse representations preferable.
Finally, some implementations recursively enumerate autocomplete results without limiting output size. A prefix with millions of descendants can generate unexpectedly large responses.
Takeaway
A Trie stores strings as paths in a tree. Shared prefixes become shared branches, making prefix operations efficient and natural. While Tries consume more memory than hash tables, they provide powerful capabilities that hash tables cannot easily support, including autocomplete, prefix counting, longest-prefix matching, and efficient dictionary traversal. The structure appears throughout search systems, networking, language processing, and large-scale text indexing.