16.11 Palindromic Trees (Eertrees)

Many string algorithms focus on prefixes, suffixes, or arbitrary substrings.

16.11 Palindromic Trees (Eertrees)

Problem

Many string algorithms focus on prefixes, suffixes, or arbitrary substrings. Palindromes introduce a different type of structure.

A palindrome reads the same forward and backward.

Examples:

a
aa
aba
abba
racecar

Given a string, you may want to answer questions such as:

  • What palindromes occur in the text?
  • How many distinct palindromes exist?
  • How often does each palindrome appear?
  • What is the longest palindromic substring?
  • What palindromes end at a given position?

A brute-force approach examines every substring and checks whether it is a palindrome.

For a string of length:

n

there are:

n(n+1)/2

substrings.

Checking each substring independently quickly becomes impractical.

The challenge is to represent all palindromic substrings efficiently while supporting fast updates and queries.

Solution

A Palindromic Tree, often called an Eertree, stores every distinct palindrome in a string.

Unlike suffix arrays and suffix automata, which represent all substrings, an Eertree focuses exclusively on palindromes.

For:

ababa

the distinct palindromes are:

a
b
aba
bab
ababa

The Eertree stores exactly these palindromes and the relationships between them.

Remarkably, the structure requires only:

O(n)

space and can be built in:

O(n)

time.

Core Observation

Suppose you process a string from left to right.

After reading:

ababa

every new palindrome must end at the current position.

For example:

a
ab
aba
abab
ababa

When the final:

a

is added, the new palindromes are:

a
aba
ababa

All of them end at the newest character.

This observation allows the structure to grow incrementally.

Nodes

Each node represents one distinct palindrome.

A node stores:

  1. The palindrome length.
  2. A suffix link.
  3. Character transitions.
  4. Optional occurrence statistics.

A simplified representation:

structure EertreeNode where
  length : Int
  suffixLink : Nat
  next : Std.HashMap Char Nat

The structure resembles a suffix automaton, but the meaning of the links is different.

Two Special Roots

Every Eertree begins with two root nodes.

Root -1

Length:

-1

Represents an imaginary palindrome.

Root 0

Length:

0

Represents the empty string.

Conceptually:

(-1 root)
     ↓
 (0 root)

These roots simplify construction and eliminate edge cases.

Without them, handling the first few characters becomes awkward.

Suppose a node represents:

ababa

Its longest proper palindromic suffix is:

aba

Therefore:

ababa
  ↓
 aba

The suffix link points from:

ababa

to:

aba

Similarly:

aba
 ↓
 a

These links form a hierarchy of nested palindromes.

ababa
  ↓
 aba
  ↓
  a
  ↓
 ""

Many algorithms operate by traversing these links.

Building the Tree

Characters are processed one at a time.

Example:

ababa

Step 1

Read:

a

New palindrome:

a

Step 2

Read:

b

New palindrome:

b

Step 3

Read:

a

New palindrome:

aba

Step 4

Read:

b

New palindrome:

bab

Step 5

Read:

a

New palindrome:

ababa

At every step, only one new distinct palindrome can be created.

This fact is one of the reasons linear construction is possible.

Finding the Correct Parent

Suppose we have processed:

abab

and now append:

a

The longest palindromic suffix before insertion is:

bab

To determine whether a new palindrome exists, we test whether:

a + bab + a

forms a palindrome.

It does:

ababa

Therefore a new node is created.

If the extension fails, we follow suffix links until a valid extension is found.

This process resembles the failure-link traversal in Aho-Corasick.

Example Structure

For:

ababa

the nodes become:

a
b
aba
bab
ababa

Suffix links:

ababa → aba
aba   → a
bab   → b
a     → empty
b     → empty

The resulting graph captures every distinct palindrome.

Lean Sketch

A construction operation typically looks like:

def addCharacter
  (tree : Eertree)
  (c : Char)
  : Eertree :=
by
  -- Find longest suffix palindrome.
  -- Attempt extension.
  -- Create node if needed.
  -- Update suffix links.
  sorry

Building the entire structure becomes:

def buildEertree
  (text : String)
  : Eertree :=
by
  -- Add each character.
  sorry

The implementation is more straightforward than suffix automata because cloning is unnecessary.

Longest Palindromic Substring

Every node stores its palindrome length.

Therefore:

Longest Palindrome
=
Maximum Node Length

For:

forgeeksskeegfor

the largest node corresponds to:

geeksskeeg

Length:

10

No additional search is required.

Counting Distinct Palindromes

The number of non-root nodes equals the number of distinct palindromic substrings.

For:

ababa

nodes:

a
b
aba
bab
ababa

Count:

5

This information is available immediately after construction.

Occurrence Counts

Each time a palindrome appears as the longest suffix, its counter increases.

After construction, counts propagate upward through suffix links.

Example:

ababa

Occurrences:

Palindrome Count
a 3
b 2
aba 2
bab 1
ababa 1

This makes frequency analysis straightforward.

Enumerating Palindromes

Unlike Manacher's algorithm, which focuses on lengths, an Eertree explicitly stores every distinct palindrome.

Enumeration becomes trivial.

for node in tree:
    output palindrome

Applications include:

  • Text analysis
  • Bioinformatics
  • Pattern discovery
  • Compression research

Complexity Analysis

Let:

Symbol Meaning
n String length
Operation Complexity
Construction O(n)
Distinct palindrome count O(1) after build
Longest palindrome O(1) after build
Occurrence counting O(n)
Memory O(n)

Each character creates at most one new node.

Therefore the total number of nodes remains linear.

Eertree vs Other Structures

Structure Represents
Suffix Array All suffixes
Suffix Automaton All substrings
Trie Pattern prefixes
Aho-Corasick Pattern dictionary
Eertree All distinct palindromes

The Eertree occupies a specialized niche.

It sacrifices general substring support in exchange for highly efficient palindrome operations.

Correctness

Every node corresponds to a unique palindrome.

Whenever a new palindrome appears, construction discovers it by extending the longest valid palindromic suffix.

Suffix links ensure that all candidate palindromes are considered.

Since each palindrome is created exactly once and every extension is verified, the resulting structure contains every distinct palindromic substring and no duplicates.

Common Pitfalls

Do not confuse palindrome occurrences with distinct palindromes.

For:

aaaa

the distinct palindromes are:

a
aa
aaa
aaaa

only four nodes.

The total number of occurrences is much larger.

Remember that suffix links connect palindromic suffixes, not arbitrary substrings.

Be careful with the two root nodes. Most implementation bugs arise from incorrect root initialization.

Takeaway

The Eertree is one of the most specialized and elegant structures in string processing. Where suffix arrays organize suffixes and suffix automata organize substrings, the Eertree organizes palindromes. By exploiting the nesting structure of palindromic suffixes, it provides linear-time construction and direct access to information that would otherwise require expensive substring analysis. For palindrome-centric problems, it is often the most natural and efficient representation available.