16.11 Palindromic Trees (Eertrees)
Many string algorithms focus on prefixes, suffixes, or arbitrary substrings.
16.11 Palindromic Trees (Eertrees)
Problem
Many string algorithms focus on prefixes, suffixes, or arbitrary substrings. Palindromes introduce a different type of structure.
A palindrome reads the same forward and backward.
Examples:
a
aa
aba
abba
racecar
Given a string, you may want to answer questions such as:
- What palindromes occur in the text?
- How many distinct palindromes exist?
- How often does each palindrome appear?
- What is the longest palindromic substring?
- What palindromes end at a given position?
A brute-force approach examines every substring and checks whether it is a palindrome.
For a string of length:
n
there are:
n(n+1)/2
substrings.
Checking each substring independently quickly becomes impractical.
The challenge is to represent all palindromic substrings efficiently while supporting fast updates and queries.
Solution
A Palindromic Tree, often called an Eertree, stores every distinct palindrome in a string.
Unlike suffix arrays and suffix automata, which represent all substrings, an Eertree focuses exclusively on palindromes.
For:
ababa
the distinct palindromes are:
a
b
aba
bab
ababa
The Eertree stores exactly these palindromes and the relationships between them.
Remarkably, the structure requires only:
O(n)
space and can be built in:
O(n)
time.
Core Observation
Suppose you process a string from left to right.
After reading:
ababa
every new palindrome must end at the current position.
For example:
a
ab
aba
abab
ababa
When the final:
a
is added, the new palindromes are:
a
aba
ababa
All of them end at the newest character.
This observation allows the structure to grow incrementally.
Nodes
Each node represents one distinct palindrome.
A node stores:
- The palindrome length.
- A suffix link.
- Character transitions.
- Optional occurrence statistics.
A simplified representation:
structure EertreeNode where
length : Int
suffixLink : Nat
next : Std.HashMap Char Nat
The structure resembles a suffix automaton, but the meaning of the links is different.
Two Special Roots
Every Eertree begins with two root nodes.
Root -1
Length:
-1
Represents an imaginary palindrome.
Root 0
Length:
0
Represents the empty string.
Conceptually:
(-1 root)
↓
(0 root)
These roots simplify construction and eliminate edge cases.
Without them, handling the first few characters becomes awkward.
Suffix Links
Suppose a node represents:
ababa
Its longest proper palindromic suffix is:
aba
Therefore:
ababa
↓
aba
The suffix link points from:
ababa
to:
aba
Similarly:
aba
↓
a
These links form a hierarchy of nested palindromes.
ababa
↓
aba
↓
a
↓
""
Many algorithms operate by traversing these links.
Building the Tree
Characters are processed one at a time.
Example:
ababa
Step 1
Read:
a
New palindrome:
a
Step 2
Read:
b
New palindrome:
b
Step 3
Read:
a
New palindrome:
aba
Step 4
Read:
b
New palindrome:
bab
Step 5
Read:
a
New palindrome:
ababa
At every step, only one new distinct palindrome can be created.
This fact is one of the reasons linear construction is possible.
Finding the Correct Parent
Suppose we have processed:
abab
and now append:
a
The longest palindromic suffix before insertion is:
bab
To determine whether a new palindrome exists, we test whether:
a + bab + a
forms a palindrome.
It does:
ababa
Therefore a new node is created.
If the extension fails, we follow suffix links until a valid extension is found.
This process resembles the failure-link traversal in Aho-Corasick.
Example Structure
For:
ababa
the nodes become:
a
b
aba
bab
ababa
Suffix links:
ababa → aba
aba → a
bab → b
a → empty
b → empty
The resulting graph captures every distinct palindrome.
Lean Sketch
A construction operation typically looks like:
def addCharacter
(tree : Eertree)
(c : Char)
: Eertree :=
by
-- Find longest suffix palindrome.
-- Attempt extension.
-- Create node if needed.
-- Update suffix links.
sorry
Building the entire structure becomes:
def buildEertree
(text : String)
: Eertree :=
by
-- Add each character.
sorry
The implementation is more straightforward than suffix automata because cloning is unnecessary.
Longest Palindromic Substring
Every node stores its palindrome length.
Therefore:
Longest Palindrome
=
Maximum Node Length
For:
forgeeksskeegfor
the largest node corresponds to:
geeksskeeg
Length:
10
No additional search is required.
Counting Distinct Palindromes
The number of non-root nodes equals the number of distinct palindromic substrings.
For:
ababa
nodes:
a
b
aba
bab
ababa
Count:
5
This information is available immediately after construction.
Occurrence Counts
Each time a palindrome appears as the longest suffix, its counter increases.
After construction, counts propagate upward through suffix links.
Example:
ababa
Occurrences:
| Palindrome | Count |
|---|---|
| a | 3 |
| b | 2 |
| aba | 2 |
| bab | 1 |
| ababa | 1 |
This makes frequency analysis straightforward.
Enumerating Palindromes
Unlike Manacher's algorithm, which focuses on lengths, an Eertree explicitly stores every distinct palindrome.
Enumeration becomes trivial.
for node in tree:
output palindrome
Applications include:
- Text analysis
- Bioinformatics
- Pattern discovery
- Compression research
Complexity Analysis
Let:
| Symbol | Meaning |
|---|---|
| n | String length |
| Operation | Complexity |
|---|---|
| Construction | O(n) |
| Distinct palindrome count | O(1) after build |
| Longest palindrome | O(1) after build |
| Occurrence counting | O(n) |
| Memory | O(n) |
Each character creates at most one new node.
Therefore the total number of nodes remains linear.
Eertree vs Other Structures
| Structure | Represents |
|---|---|
| Suffix Array | All suffixes |
| Suffix Automaton | All substrings |
| Trie | Pattern prefixes |
| Aho-Corasick | Pattern dictionary |
| Eertree | All distinct palindromes |
The Eertree occupies a specialized niche.
It sacrifices general substring support in exchange for highly efficient palindrome operations.
Correctness
Every node corresponds to a unique palindrome.
Whenever a new palindrome appears, construction discovers it by extending the longest valid palindromic suffix.
Suffix links ensure that all candidate palindromes are considered.
Since each palindrome is created exactly once and every extension is verified, the resulting structure contains every distinct palindromic substring and no duplicates.
Common Pitfalls
Do not confuse palindrome occurrences with distinct palindromes.
For:
aaaa
the distinct palindromes are:
a
aa
aaa
aaaa
only four nodes.
The total number of occurrences is much larger.
Remember that suffix links connect palindromic suffixes, not arbitrary substrings.
Be careful with the two root nodes. Most implementation bugs arise from incorrect root initialization.
Takeaway
The Eertree is one of the most specialized and elegant structures in string processing. Where suffix arrays organize suffixes and suffix automata organize substrings, the Eertree organizes palindromes. By exploiting the nesting structure of palindromic suffixes, it provides linear-time construction and direct access to information that would otherwise require expensive substring analysis. For palindrome-centric problems, it is often the most natural and efficient representation available.