20.1 Building an Autocomplete Engine

Design an autocomplete engine that provides search suggestions while a user types.

20.1 Building an Autocomplete Engine

Problem

Design an autocomplete engine that provides search suggestions while a user types. The system must return relevant completions with low latency, support millions of phrases, rank results by popularity, and scale as the dataset grows.

Autocomplete appears in search engines, e-commerce websites, code editors, IDEs, messaging applications, and command-line interfaces. Although the user experiences a simple dropdown of suggestions, the underlying system combines multiple algorithms and data structures to achieve fast response times.

The challenge is to answer prefix queries efficiently. Given the input:

alg

the system might return:

algorithm
algorithm design
algorithm cookbook
algorithm visualization
algorithm complexity

A naive solution scans every phrase and checks whether the phrase begins with the requested prefix. This approach becomes impractical once the dataset contains hundreds of thousands or millions of entries.

Understanding the Requirements

Before selecting algorithms, define the operational requirements.

Functional requirements:

  • Return completions for a prefix.
  • Rank suggestions by popularity.
  • Support insertion of new phrases.
  • Support phrase updates.
  • Handle duplicate searches.

Performance requirements:

  • Query latency under 50 milliseconds.
  • Millions of searchable phrases.
  • Frequent updates.
  • Predictable memory consumption.

The first requirement immediately suggests that exact string matching is insufficient. The system must answer prefix queries.

Naive Baseline

Assume the dataset contains:

algorithm
algorithm design
algorithm cookbook
array algorithms
binary search
bread recipe

A simple implementation performs:

def autocomplete(prefix, phrases):
    results = []

    for phrase in phrases:
        if phrase.startswith(prefix):
            results.append(phrase)

    return results

Complexity:

Operation Complexity
Query O(N × L)
Insert O(1)
Memory O(N × L)

Where:

  • N = number of phrases
  • L = average phrase length

For one million phrases, scanning the entire dataset for every keystroke becomes unacceptable.

If phrases are stored in sorted order:

algorithm
algorithm cookbook
algorithm design
array algorithms
binary search
bread recipe

Binary search can locate the first phrase beginning with a prefix.

For prefix:

alg

Search identifies the first matching position and then scans forward until the prefix no longer matches.

Complexity:

Operation Complexity
Query O(log N + K)
Insert O(N)
Memory O(N)

Where K is the number of returned matches.

This approach works well for mostly static datasets but becomes expensive when updates are frequent.

Choosing a Trie

The classical autocomplete data structure is the trie.

Each node represents a character.

Dataset:

algorithm
array
art

Produces:

root
 ├─ a
      ├─ l
      │    └─ g
      │         ...
      └─ r
           ├─ r
           └─ t

Each edge corresponds to one character.

Searching for a prefix becomes traversal of the character path.

Query:

alg

Traversal:

root -> a -> l -> g

Once the final node is reached, all descendants represent valid completions.

Complexity:

Operation Complexity
Insert O(L)
Search O(L)
Memory O(total characters)

This performance is independent of the number of phrases.

For autocomplete workloads, this property is extremely valuable.

Trie Node Design

A basic node stores:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

Insertion:

def insert(root, word):
    node = root

    for ch in word:
        if ch not in node.children:
            node.children[ch] = TrieNode()

        node = node.children[ch]

    node.is_word = True

Searching:

def find_prefix(root, prefix):
    node = root

    for ch in prefix:
        if ch not in node.children:
            return None

        node = node.children[ch]

    return node

Collecting Suggestions

After locating the prefix node, perform DFS.

def collect(node, prefix, results):
    if node.is_word:
        results.append(prefix)

    for ch, child in node.children.items():
        collect(child, prefix + ch, results)

Query:

node = find_prefix(root, "alg")

results = []
collect(node, "alg", results)

Output:

algorithm
algorithm cookbook
algorithm design

Ranking Results

Users expect useful suggestions rather than alphabetical suggestions.

Store popularity counts.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False
        self.score = 0

Every search increments popularity.

Example:

Phrase Score
algorithm 12000
algorithm design 5000
algorithm cookbook 2000

Ranking becomes:

algorithm
algorithm design
algorithm cookbook

instead of alphabetical order.

Limiting Search Cost

A large subtree may contain thousands of completions.

Example:

a...

could match millions of phrases.

Performing DFS across the entire subtree on every query would be expensive.

A common optimization stores top suggestions at every node.

Node:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.top = []

Example:

Prefix: alg

Top:
1. algorithm
2. algorithm design
3. algorithm cookbook

When a query reaches node "alg", suggestions are already available.

Complexity becomes:

Operation Complexity
Query O(L)
Insert O(L log K)
Memory Higher

This tradeoff is often worthwhile.

Handling Typographical Errors

Users frequently type:

algoritm

instead of:

algorithm

Approximate matching techniques improve usability.

Common approaches:

  • Edit distance
  • BK-trees
  • N-gram indexes
  • SymSpell
  • Levenshtein automata

Edit distance measures the number of insertions, deletions, and substitutions required to transform one string into another.

For example:

algoritm
algorithm

Distance:

1

A practical system often combines trie search with edit-distance ranking.

Caching Frequent Queries

Autocomplete workloads exhibit strong repetition.

Popular prefixes:

a
al
alg
algo

are requested repeatedly.

Cache results:

cache["alg"] = [
    "algorithm",
    "algorithm design",
    "algorithm cookbook"
]

Subsequent requests avoid tree traversal entirely.

Common cache structures:

  • Hash tables
  • LRU caches
  • LFU caches

A cache frequently reduces latency by an order of magnitude.

Memory Optimization

Standard tries consume substantial memory.

Consider:

algorithm
algorithms
algorithmic

Many prefixes repeat.

A compressed trie merges chains of single-child nodes.

Instead of:

a -> l -> g -> o ...

store:

algorithm

as one edge.

Benefits:

  • Lower memory usage
  • Better cache locality
  • Faster traversal

Large-scale systems often use:

  • Radix trees
  • Patricia tries
  • Finite-state transducers (FSTs)

instead of naive tries.

Distributed Architecture

At scale, autocomplete becomes a distributed service.

Architecture:

User
  |
Load Balancer
  |
Autocomplete Service
  |
Trie / FST Index
  |
Analytics Pipeline

Analytics pipeline collects:

Search query
Timestamp
Clicks
Conversions
Region

These signals continuously update rankings.

Popular searches gradually move upward without manual intervention.

Complexity Summary

Operation Complexity
Trie insert O(L)
Trie search O(L)
DFS completion O(M)
Cached lookup O(1)
Binary-search approach O(log N + K)

Where:

  • L = prefix length
  • M = subtree size
  • N = number of phrases
  • K = number of returned suggestions

Recipe

When building an autocomplete engine:

  1. Start with a trie for prefix lookup.
  2. Store popularity scores for ranking.
  3. Maintain top-k suggestions at each node.
  4. Add caching for common prefixes.
  5. Compress the trie when memory becomes significant.
  6. Introduce approximate matching for typo tolerance.
  7. Continuously update rankings using user behavior.
  8. Benchmark latency under realistic workloads.

The final system combines string algorithms, trees, hashing, caching, ranking, and analytics. Autocomplete therefore serves as an excellent example of how multiple algorithmic techniques cooperate inside a production application rather than appearing as isolated textbook exercises.