16.6 Trie Matching

Suppose you need to search for many patterns simultaneously.

16.6 Trie Matching

Problem

Suppose you need to search for many patterns simultaneously.

Consider a dictionary:

he
her
hers
hero
help
hello

and a text:

hello hero and her helpers

A straightforward approach searches for each word independently.

search("he")
search("her")
search("hers")
search("hero")
search("help")
search("hello")

If there are thousands of patterns, the same text characters may be scanned repeatedly. The cost grows with both the text length and the number of patterns.

The challenge is to search for many strings at once while sharing as much work as possible.

Solution

A trie stores strings by sharing common prefixes.

Instead of storing:

he
her
hers
hero
help
hello

as six independent strings, a trie stores the common prefix:

he

only once.

Conceptually:

(root)
   |
   h
   |
   e
  /|\\n r l *
/ \ \\ns o p
|
s

Each path from the root to a terminal node represents a word.

This organization transforms repeated prefix comparisons into a single traversal.

Understanding the Structure

Consider the words:

cat
car
care
cart

The trie becomes:

(root)
   |
   c
   |
   a
  / \\n t   r
    / \\n   e   t

Notice that:

ca

appears only once.

The trie stores shared structure rather than duplicated characters.

This principle is responsible for both its efficiency and its memory characteristics.

Building a Trie

Each node contains:

  1. A collection of child edges.
  2. A marker indicating whether a word ends at that node.

A simple Lean representation might look like:

structure TrieNode where
  terminal : Bool
  children : Std.HashMap Char TrieNode

A trie itself is simply the root node.

abbrev Trie := TrieNode

The root does not correspond to a character. It serves as the entry point for every search.

Inserting a Word

To insert:

hero

start at the root.

Process characters one at a time:

h
e
r
o

Create missing nodes as needed.

Mark the final node as terminal.

After insertion:

(root)
  |
  h
  |
  e
  |
  r
  |
  o*

The asterisk indicates the end of a valid word.

Lean Implementation

A recursive insertion routine follows the path defined by the word.

partial def insert
  (node : TrieNode)
  (chars : List Char)
  : TrieNode :=

  match chars with
  | [] =>
      { node with terminal := true }

  | c :: rest =>
      let child :=
        node.children.findD c
          { terminal := false
            children := {} }

      let updated :=
        insert child rest

      { node with
          children :=
            node.children.insert c updated }

The implementation is less important than the invariant:

Every word corresponds to exactly one root-to-terminal path.

Searching for a Word

Searching follows the same path construction.

To search for:

hero

traverse:

root
 → h
 → e
 → r
 → o

The search succeeds only if:

  1. Every edge exists.
  2. The final node is terminal.

This distinction matters.

Suppose the trie contains:

hero

Searching for:

her

reaches a node, but not necessarily a terminal node.

Therefore:

hero  → present
her   → absent

unless both words were inserted.

Prefix Queries

One of the most useful properties of a trie is efficient prefix search.

Suppose a user types:

hel

We traverse:

root
 → h
 → e
 → l

If that path exists, all descendants represent valid completions.

hel
├── hello
├── help
├── helper
└── helmet

This capability forms the foundation of:

  • Search suggestions
  • Code completion
  • Dictionary lookup
  • Predictive text systems

Matching Within Text

Now consider:

Text = "herohelpedher"

Starting at each position:

herohelpedher
^

herohelpedher
 ^

herohelpedher
  ^

we walk through the trie until no edge exists.

Whenever a terminal node is reached, a match is reported.

For example:

herohelpedher
^^^^
hero

and:

herohelpedher
    ^^^^
    help

can be discovered during traversal.

The trie allows many patterns to be searched simultaneously because all words share a common structure.

Example

Dictionary:

he
her
hero
help

Text:

hero helped her

Matches:

he
her
hero
help
he
her

All are discovered using the same trie.

No separate search for each word is required.

Complexity Analysis

Let:

Symbol Meaning
n Text length
m Pattern length
k Number of patterns

Insert

O(m)

per word.

Lookup

O(m)

per query.

O(m)

to reach the prefix node.

Storage

O(total characters)

across all patterns.

Because prefixes are shared, the actual memory consumption is often substantially smaller than storing every word independently.

Advantages

Tries excel when:

  • Many strings share prefixes.
  • Prefix queries are frequent.
  • Autocomplete is required.
  • Dictionary membership must be fast.
  • Large vocabularies are searched repeatedly.

Examples include:

  • Search engines
  • IDE code completion
  • Spell checkers
  • Routing tables
  • Natural language processing

Limitations

Tries also have tradeoffs.

A sparse trie may contain many nodes with few children.

For example:

aardvark
zebra

share almost nothing.

The resulting structure contains many partially populated nodes.

In memory-constrained environments, compressed variants are often preferable.

Common alternatives include:

  • Radix trees
  • Patricia tries
  • Ternary search trees
  • Finite-state automata

From Tries to Automata

A trie solves dictionary lookup and prefix matching elegantly, but matching every pattern at every text position can still be expensive.

The next step is to augment the trie with failure transitions that allow matching to continue without restarting from the root.

This idea leads directly to the Aho-Corasick algorithm, one of the most important multiple-pattern matching algorithms ever developed. It combines trie structure with automaton techniques to search for thousands of patterns simultaneously in linear time.