8.16 Persistent Trees
Most data structures answer questions about the present.
8.16 Persistent Trees
Most data structures answer questions about the present.
A value changes, the old value disappears, and the structure reflects only the latest state.
Many real systems, however, need access to the past.
Examples include:
- Version control systems
- Database snapshots
- Financial ledgers
- Historical analytics
- Time-travel debugging
- Undo functionality
- Immutable functional data structures
Suppose a value changes:
Version 1:
[1 2 3 4]
Version 2:
[1 2 10 4]
Version 3:
[1 2 10 7]
A traditional Segment Tree stores only Version 3.
A Persistent Tree preserves all versions simultaneously.
Queries can be executed against any historical state without copying the entire structure.
This capability is surprisingly powerful and appears frequently in advanced algorithmic systems.
Problem
You need to support updates while preserving access to previous versions of the data.
Copying the entire structure after every modification is too expensive.
Solution
Reuse unchanged portions of the tree.
Consider a Segment Tree:
A
/ \\n B C
/ \ / \\n D E F G
Suppose an update affects only node:
F
A naive copy duplicates the entire tree.
A Persistent Tree duplicates only the nodes along the update path.
Version 1
A
/ \\n B C
/ \ / \\n D E F G
Version 2
A'
/ \\n B C'
/ \ / \\n D E F' G
The unchanged nodes:
B
D
E
G
are shared.
Only:
A
C
F
are duplicated.
This technique is called path copying.
Discussion
Persistence relies on immutability.
Instead of modifying a node:
node.Value = 42
create a new node:
newNode := &Node{
Value: 42,
}
and reuse every unaffected subtree.
Each update produces a new root.
The old root remains valid.
The structure naturally forms a version tree.
Persistent Segment Tree Node
A pointer-based representation is common.
type Node struct {
Sum int
Left *Node
Right *Node
}
Unlike array-based Segment Trees, persistence benefits from explicit pointers because individual nodes are copied selectively.
Building the Initial Version
Construction resembles an ordinary Segment Tree.
func build(
values []int,
left int,
right int,
) *Node {
if left == right {
return &Node{
Sum: values[left],
}
}
mid := (left + right) / 2
leftChild :=
build(values, left, mid)
rightChild :=
build(
values,
mid+1,
right,
)
return &Node{
Sum:
leftChild.Sum +
rightChild.Sum,
Left: leftChild,
Right: rightChild,
}
}
This produces Version 0.
Every future version begins from this root.
Persistent Updates
The key operation is update.
Instead of modifying existing nodes, create replacements along the update path.
func update(
node *Node,
left int,
right int,
index int,
value int,
) *Node {
if left == right {
return &Node{
Sum: value,
}
}
mid := (left + right) / 2
if index <= mid {
newLeft := update(
node.Left,
left,
mid,
index,
value,
)
return &Node{
Sum:
newLeft.Sum +
node.Right.Sum,
Left: newLeft,
Right: node.Right,
}
}
newRight := update(
node.Right,
mid+1,
right,
index,
value,
)
return &Node{
Sum:
node.Left.Sum +
newRight.Sum,
Left: node.Left,
Right: newRight,
}
}
Notice the sharing.
One child remains unchanged and is reused directly.
Only the path from root to leaf is recreated.
Storing Versions
Each update produces a new root.
versions := []*Node{}
root0 := build(values, 0, n-1)
versions =
append(versions, root0)
root1 :=
update(
root0,
0,
n-1,
3,
100,
)
versions =
append(versions, root1)
Now:
versions[0]
and:
versions[1]
represent different historical states.
Both remain available.
Historical Queries
Queries work exactly as before.
The only difference is choosing the version.
func query(
node *Node,
left int,
right int,
ql int,
qr int,
) int
Example:
query(
versions[0],
0,
n-1,
2,
5,
)
queries the original state.
query(
versions[7],
0,
n-1,
2,
5,
)
queries Version 7.
No special logic is required.
The version is determined entirely by the chosen root.
Visualizing Sharing
Suppose:
Version 0:
[1 2 3 4]
Update:
Index 2 = 10
Version 1:
[1 2 10 4]
Only one path changes.
Root
└── Right
└── Left
Everything else is shared.
Memory usage grows much more slowly than full copying.
This efficiency is the reason persistence is practical.
Why Persistence Works
A traditional update modifies:
old state
into:
new state
destroying historical information.
Persistence changes the model.
old state
\\n new state
Both versions continue to exist.
Updates become:
Create new path
Reuse old subtrees
rather than:
Overwrite existing nodes
This distinction is fundamental.
Persistent Arrays
Conceptually, a Persistent Segment Tree acts like an immutable array.
Example:
Version 0:
[1 2 3 4]
Version 1:
[1 2 5 4]
Version 2:
[1 2 5 9]
Each update creates a new array version.
The difference is efficiency.
Naively copying arrays costs:
O(n)
per update.
Persistence costs:
O(log n)
per update.
K-th Smallest Queries
One of the most famous applications uses persistent frequency trees.
Suppose:
Array:
5 2 7 1 9
Build versions:
Version 0:
empty
Version 1:
5
Version 2:
5 2
Version 3:
5 2 7
...
Each version represents a prefix.
Comparing two versions reveals frequencies inside a range.
This enables:
k-th smallest element
in interval [L,R]
queries.
The technique appears frequently in competitive programming and database indexing.
Persistent Trees and Functional Programming
Persistence is natural in functional languages.
Languages such as:
- Haskell
- OCaml
- F#
- Clojure
favor immutable structures.
Updating data naturally creates new versions.
Persistent trees fit perfectly into this model.
Many functional collections use similar ideas internally.
Time Travel Queries
Persistence enables queries that would otherwise be difficult.
Examples:
What was the value
before yesterday's update?
What did the hierarchy
look like last month?
What was the state
before commit 500?
No reconstruction is necessary.
The historical version already exists.
The query simply starts from the corresponding root.
Partial Persistence vs Full Persistence
Most persistent segment trees are partially persistent.
Meaning:
Old versions:
read only
Newest version:
modifiable
Full persistence allows updates to any historical version.
Version 3
\\n Version 8
Version 3
\\n Version 9
The versions form a branching tree.
This resembles source-control systems such as Git.
The implementation becomes slightly more complex, but the underlying ideas remain the same.
Persistence and Memory
Persistence trades memory for historical access.
Each update copies:
O(log n)
nodes.
After:
m updates
memory usage becomes:
O(n + m log n)
This is dramatically smaller than:
O(mn)
which would result from copying the entire structure.
The sharing of unchanged subtrees is essential.
Complexity Analysis
Let:
n= number of elementsm= number of updates
Building:
| Operation | Complexity |
|---|---|
| Initial tree | O(n) |
Updates:
| Operation | Complexity |
|---|---|
| Persistent update | O(log n) |
Queries:
| Operation | Complexity |
|---|---|
| Range query | O(log n) |
Memory:
| Structure | Complexity |
|---|---|
| Initial tree | O(n) |
| Each update | O(log n) |
| Total | O(n + m log n) |
This efficiency makes persistence practical even for large systems.
Common Mistakes
A common mistake is accidentally modifying shared nodes.
For example:
node.Left = ...
changes every version that references that node.
Persistent structures must treat existing nodes as immutable.
Another mistake is copying entire subtrees unnecessarily.
Only nodes on the update path require duplication.
Everything else should be shared.
A third mistake is losing version roots.
Without the root pointer:
Version 17
becomes unreachable.
Persistent systems typically maintain an explicit version list.
Finally, many implementations underestimate memory usage. While persistence is efficient, millions of updates still create millions of additional nodes.
Takeaway
Persistent Trees preserve historical versions while supporting efficient updates and queries. By copying only the nodes along an update path and sharing unchanged subtrees, they provide time-travel capabilities with logarithmic update costs. This technique forms the foundation of immutable data structures, historical analytics, version-control systems, and many advanced range-query algorithms.