8.22 Succinct Trees

Most tree representations focus on speed.

8.22 Succinct Trees

Most tree representations focus on speed.

Examples include:

  • Pointer-based trees
  • Adjacency lists
  • Segment Trees
  • Tries
  • Link-Cut Trees

These structures prioritize fast updates and queries, often at the cost of memory.

For small trees, this tradeoff is acceptable.

For massive datasets, memory becomes a primary concern.

Consider:

1 billion nodes

Even storing a few pointers per node can require tens of gigabytes of memory.

In many applications, the tree structure is static.

Examples include:

  • XML documents
  • Syntax trees
  • Biological taxonomies
  • Search indexes
  • Compressed dictionaries
  • Web graphs

In these situations, we can sacrifice update flexibility in exchange for extremely compact storage.

Succinct Trees represent tree structure using close to the theoretical minimum number of bits while still supporting efficient navigation.

This area sits at the intersection of data structures, information theory, and compression.

Problem

You need to store very large static trees while minimizing memory usage.

Traditional pointer-based representations consume too much space.

Solution

Represent tree topology as a bit sequence.

Consider:

      A
     / \\n    B   C
       /
      D

Perform a depth-first traversal.

When entering a node:

1

When leaving a node:

0

The traversal becomes:

A enter
B enter
B leave
C enter
D enter
D leave
C leave
A leave

Bit sequence:

11011000

This representation is called a balanced-parentheses encoding.

Every node contributes:

10

Therefore:

2n bits

represent an entire tree with:

n nodes

Remarkably, this is close to the information-theoretic minimum.

Discussion

At first glance:

11011000

appears useless.

Where are the nodes?

Where are the edges?

Where are the parent pointers?

The surprising answer is that all of this information is still present.

The challenge is learning how to navigate the encoded structure efficiently.

Balanced Parentheses Representation

The same sequence can be written as parentheses.

( ( ) ( ( ) ) )

Removing spaces:

(()(()))

Every node corresponds to a matching pair.

For example:

()

represents a leaf.

Nested pairs represent descendants.

The structure emerges from containment relationships.

Why 2n Bits Matters

Consider a pointer-based tree.

type Node struct {
    Parent *Node
    Child  *Node
    Next   *Node
}

On a 64-bit machine:

3 pointers
=
24 bytes
=
192 bits

per node.

Balanced parentheses require:

2 bits

per node.

The difference is enormous.

For large static trees, memory savings can be measured in orders of magnitude.

Suppose:

(()(()))

Positions:

0 1 2 3 4 5 6 7
( ( ) ( ( ) ) )

Each opening parenthesis identifies a node.

Each matching closing parenthesis identifies the end of its subtree.

Finding the matching parenthesis allows subtree boundaries to be determined.

This becomes the basis for navigation.

Parent Relationships

Consider:

(()(()))

The node at:

position 4

corresponds to:

(

Its matching parenthesis occurs later.

The nearest enclosing pair represents its parent.

Thus:

parent(node)

can be derived directly from the bit sequence.

No explicit pointer is required.

Subtree Boundaries

Suppose a node begins at:

position i

and ends at:

position j

Everything between:

i
and
j

belongs to its subtree.

Subtree extraction becomes:

Interval lookup

inside the bit sequence.

This is one reason succinct trees integrate well with compressed indexing structures.

Rank Operations

Efficient navigation relies on two primitives.

Rank

Count occurrences up to a position.

Example:

Bit sequence:

11011000

Question:

rank1(5)

Answer:

4

because four ones occur before position five.

Implementation:

rank(bit, position)

is a fundamental operation in succinct data structures.

Select Operations

Select is the inverse of rank.

Question:

Where is
the fourth 1?

Answer:

Position 4

Operation:

select1(4)

Efficient rank and select structures allow trees to be navigated directly on compressed representations.

Node Numbering

Every opening parenthesis corresponds to a node.

Example:

(()(()))

Opening positions:

0
1
3
4

Node identifiers become:

rank1(position)

This mapping allows algorithms to reference nodes without storing explicit IDs.

Child Navigation

Suppose:

(

appears at position:

i

The first child appears immediately after it.

Subsequent children can be located using matching-parenthesis operations.

This allows:

firstChild(node)
nextSibling(node)

to be implemented without pointers.

The structure behaves like a normal tree despite being stored as bits.

Depth Computation

Depth corresponds to balance.

Interpret:

(
=
+1

)
=
-1

For:

(()(()))

running balance:

1
2
1
2
3
2
1
0

The balance at a node equals its depth.

This property enables depth queries directly from the encoding.

Lowest Common Ancestor

Surprisingly, even LCA queries can be supported.

The balanced-parentheses representation can be transformed into:

Depth sequence

Then:

LCA

becomes a Range Minimum Query problem.

Many succinct tree structures support LCA in constant time.

This is one of the reasons succinct representations are so powerful.

LOUDS Encoding

Another popular representation is LOUDS.

Meaning:

Level-Order Unary Degree Sequence

Example tree:

      A
     / \\n    B   C
       /
      D

Children counts:

A = 2
B = 0
C = 1
D = 0

Encode each count as:

1110

style unary values.

Result:

11001000

This representation supports efficient level-order navigation.

LOUDS is widely used in compressed tries and search indexes.

Succinct Tries

Search engines often store billions of strings.

Traditional trie nodes may require:

tens of bytes

per node.

Succinct encodings dramatically reduce memory usage.

Examples include:

  • Web search dictionaries
  • Autocomplete systems
  • Spell checkers
  • DNA indexes

The reduction in memory often improves cache performance as well.

Wavelet Trees

Wavelet Trees are closely related to succinct structures.

They combine:

  • Trees
  • Bit vectors
  • Rank
  • Select

to support compressed sequence queries.

Examples:

k-th smallest
frequency
range counting

Many modern indexing systems combine Wavelet Trees with succinct tree representations.

XML Processing

XML documents naturally form trees.

Large XML databases may contain millions of nodes.

Storing explicit pointers becomes expensive.

Succinct trees allow:

parent
child
ancestor
subtree

queries while using far less memory.

This application helped drive much of the research in succinct data structures.

Information-Theoretic Perspective

How many bits are actually necessary to represent a tree?

For a tree with:

n nodes

the answer is approximately:

2n bits

Ignoring lower-order terms.

Balanced-parentheses encodings approach this limit.

This means:

Very little space
is wasted.

Few practical data structures operate so close to theoretical optimality.

Static vs Dynamic Tradeoffs

Succinct trees are primarily designed for static data.

Updates are difficult.

Pointer-based structures excel at:

Insert
Delete
Modify

Succinct trees excel at:

Store
Navigate
Query

Choosing between them depends on workload requirements.

Complexity Analysis

Let:

n = number of nodes

Storage:

Representation Space
Pointer tree O(n pointers)
Balanced parentheses 2n bits
LOUDS O(n) bits

Queries with rank/select support:

Operation Complexity
Parent O(1)
Child O(1)
Depth O(1)
LCA O(1)

Construction:

Operation Complexity
Encoding O(n)

The constant factors depend on the underlying rank/select implementation.

Common Mistakes

A common mistake is viewing succinct trees as compressed text.

They are not compression formats.

They are navigable data structures.

Another mistake is ignoring rank and select support. Without these operations, navigation becomes inefficient.

A third mistake is choosing succinct trees for highly dynamic workloads. Updates are often significantly more expensive than in pointer-based structures.

Finally, many newcomers focus only on memory savings. In practice, improved cache locality is often just as important as reduced storage size.

Takeaway

Succinct Trees represent tree topology using close to the minimum number of bits required by information theory. Techniques such as balanced-parentheses encoding and LOUDS allow efficient navigation without explicit pointers, supporting operations such as parent lookup, child traversal, depth computation, and Lowest Common Ancestor queries. While best suited for static trees, succinct representations provide extraordinary memory efficiency and form the foundation of many modern indexing, search, and compressed-data systems.