8.22 Succinct Trees
Most tree representations focus on speed.
8.22 Succinct Trees
Most tree representations focus on speed.
Examples include:
- Pointer-based trees
- Adjacency lists
- Segment Trees
- Tries
- Link-Cut Trees
These structures prioritize fast updates and queries, often at the cost of memory.
For small trees, this tradeoff is acceptable.
For massive datasets, memory becomes a primary concern.
Consider:
1 billion nodes
Even storing a few pointers per node can require tens of gigabytes of memory.
In many applications, the tree structure is static.
Examples include:
- XML documents
- Syntax trees
- Biological taxonomies
- Search indexes
- Compressed dictionaries
- Web graphs
In these situations, we can sacrifice update flexibility in exchange for extremely compact storage.
Succinct Trees represent tree structure using close to the theoretical minimum number of bits while still supporting efficient navigation.
This area sits at the intersection of data structures, information theory, and compression.
Problem
You need to store very large static trees while minimizing memory usage.
Traditional pointer-based representations consume too much space.
Solution
Represent tree topology as a bit sequence.
Consider:
A
/ \\n B C
/
D
Perform a depth-first traversal.
When entering a node:
1
When leaving a node:
0
The traversal becomes:
A enter
B enter
B leave
C enter
D enter
D leave
C leave
A leave
Bit sequence:
11011000
This representation is called a balanced-parentheses encoding.
Every node contributes:
10
Therefore:
2n bits
represent an entire tree with:
n nodes
Remarkably, this is close to the information-theoretic minimum.
Discussion
At first glance:
11011000
appears useless.
Where are the nodes?
Where are the edges?
Where are the parent pointers?
The surprising answer is that all of this information is still present.
The challenge is learning how to navigate the encoded structure efficiently.
Balanced Parentheses Representation
The same sequence can be written as parentheses.
( ( ) ( ( ) ) )
Removing spaces:
(()(()))
Every node corresponds to a matching pair.
For example:
()
represents a leaf.
Nested pairs represent descendants.
The structure emerges from containment relationships.
Why 2n Bits Matters
Consider a pointer-based tree.
type Node struct {
Parent *Node
Child *Node
Next *Node
}
On a 64-bit machine:
3 pointers
=
24 bytes
=
192 bits
per node.
Balanced parentheses require:
2 bits
per node.
The difference is enormous.
For large static trees, memory savings can be measured in orders of magnitude.
Navigating the Structure
Suppose:
(()(()))
Positions:
0 1 2 3 4 5 6 7
( ( ) ( ( ) ) )
Each opening parenthesis identifies a node.
Each matching closing parenthesis identifies the end of its subtree.
Finding the matching parenthesis allows subtree boundaries to be determined.
This becomes the basis for navigation.
Parent Relationships
Consider:
(()(()))
The node at:
position 4
corresponds to:
(
Its matching parenthesis occurs later.
The nearest enclosing pair represents its parent.
Thus:
parent(node)
can be derived directly from the bit sequence.
No explicit pointer is required.
Subtree Boundaries
Suppose a node begins at:
position i
and ends at:
position j
Everything between:
i
and
j
belongs to its subtree.
Subtree extraction becomes:
Interval lookup
inside the bit sequence.
This is one reason succinct trees integrate well with compressed indexing structures.
Rank Operations
Efficient navigation relies on two primitives.
Rank
Count occurrences up to a position.
Example:
Bit sequence:
11011000
Question:
rank1(5)
Answer:
4
because four ones occur before position five.
Implementation:
rank(bit, position)
is a fundamental operation in succinct data structures.
Select Operations
Select is the inverse of rank.
Question:
Where is
the fourth 1?
Answer:
Position 4
Operation:
select1(4)
Efficient rank and select structures allow trees to be navigated directly on compressed representations.
Node Numbering
Every opening parenthesis corresponds to a node.
Example:
(()(()))
Opening positions:
0
1
3
4
Node identifiers become:
rank1(position)
This mapping allows algorithms to reference nodes without storing explicit IDs.
Child Navigation
Suppose:
(
appears at position:
i
The first child appears immediately after it.
Subsequent children can be located using matching-parenthesis operations.
This allows:
firstChild(node)
nextSibling(node)
to be implemented without pointers.
The structure behaves like a normal tree despite being stored as bits.
Depth Computation
Depth corresponds to balance.
Interpret:
(
=
+1
)
=
-1
For:
(()(()))
running balance:
1
2
1
2
3
2
1
0
The balance at a node equals its depth.
This property enables depth queries directly from the encoding.
Lowest Common Ancestor
Surprisingly, even LCA queries can be supported.
The balanced-parentheses representation can be transformed into:
Depth sequence
Then:
LCA
becomes a Range Minimum Query problem.
Many succinct tree structures support LCA in constant time.
This is one of the reasons succinct representations are so powerful.
LOUDS Encoding
Another popular representation is LOUDS.
Meaning:
Level-Order Unary Degree Sequence
Example tree:
A
/ \\n B C
/
D
Children counts:
A = 2
B = 0
C = 1
D = 0
Encode each count as:
1110
style unary values.
Result:
11001000
This representation supports efficient level-order navigation.
LOUDS is widely used in compressed tries and search indexes.
Succinct Tries
Search engines often store billions of strings.
Traditional trie nodes may require:
tens of bytes
per node.
Succinct encodings dramatically reduce memory usage.
Examples include:
- Web search dictionaries
- Autocomplete systems
- Spell checkers
- DNA indexes
The reduction in memory often improves cache performance as well.
Wavelet Trees
Wavelet Trees are closely related to succinct structures.
They combine:
- Trees
- Bit vectors
- Rank
- Select
to support compressed sequence queries.
Examples:
k-th smallest
frequency
range counting
Many modern indexing systems combine Wavelet Trees with succinct tree representations.
XML Processing
XML documents naturally form trees.
Large XML databases may contain millions of nodes.
Storing explicit pointers becomes expensive.
Succinct trees allow:
parent
child
ancestor
subtree
queries while using far less memory.
This application helped drive much of the research in succinct data structures.
Information-Theoretic Perspective
How many bits are actually necessary to represent a tree?
For a tree with:
n nodes
the answer is approximately:
2n bits
Ignoring lower-order terms.
Balanced-parentheses encodings approach this limit.
This means:
Very little space
is wasted.
Few practical data structures operate so close to theoretical optimality.
Static vs Dynamic Tradeoffs
Succinct trees are primarily designed for static data.
Updates are difficult.
Pointer-based structures excel at:
Insert
Delete
Modify
Succinct trees excel at:
Store
Navigate
Query
Choosing between them depends on workload requirements.
Complexity Analysis
Let:
n = number of nodes
Storage:
| Representation | Space |
|---|---|
| Pointer tree | O(n pointers) |
| Balanced parentheses | 2n bits |
| LOUDS | O(n) bits |
Queries with rank/select support:
| Operation | Complexity |
|---|---|
| Parent | O(1) |
| Child | O(1) |
| Depth | O(1) |
| LCA | O(1) |
Construction:
| Operation | Complexity |
|---|---|
| Encoding | O(n) |
The constant factors depend on the underlying rank/select implementation.
Common Mistakes
A common mistake is viewing succinct trees as compressed text.
They are not compression formats.
They are navigable data structures.
Another mistake is ignoring rank and select support. Without these operations, navigation becomes inefficient.
A third mistake is choosing succinct trees for highly dynamic workloads. Updates are often significantly more expensive than in pointer-based structures.
Finally, many newcomers focus only on memory savings. In practice, improved cache locality is often just as important as reduced storage size.
Takeaway
Succinct Trees represent tree topology using close to the minimum number of bits required by information theory. Techniques such as balanced-parentheses encoding and LOUDS allow efficient navigation without explicit pointers, supporting operations such as parent lookup, child traversal, depth computation, and Lowest Common Ancestor queries. While best suited for static trees, succinct representations provide extraordinary memory efficiency and form the foundation of many modern indexing, search, and compressed-data systems.