8.21 Tree Serialization
A tree in memory is not directly portable.
8.21 Tree Serialization
A tree in memory is not directly portable. It may contain pointers, slices, maps, parent links, or language-specific objects. To store it in a file, send it over a network, cache it, compare it, or reconstruct it later, you need a serialized representation.
Serialization converts a tree into a linear form.
Deserialization converts that linear form back into a tree.
This recipe shows common tree serialization patterns and explains where each one fits.
Problem
You need to encode a tree so that it can be stored, transmitted, compared, or reconstructed.
Solution
Choose a traversal order and include enough information to recover structure.
For binary trees, preorder traversal with null markers is a common representation.
Consider:
1
/ \\n 2 3
/
4
A preorder serialization with null markers is:
1 2 # # 3 4 # # #
The # marker represents an empty child.
Implementation in Go:
package main
import (
"strconv"
"strings"
)
type Node struct {
Value int
Left *Node
Right *Node
}
func serialize(root *Node) string {
parts := []string{}
var walk func(*Node)
walk = func(node *Node) {
if node == nil {
parts = append(parts, "#")
return
}
parts = append(parts, strconv.Itoa(node.Value))
walk(node.Left)
walk(node.Right)
}
walk(root)
return strings.Join(parts, " ")
}
This representation is compact enough for small systems and precise enough to reconstruct the original binary tree.
Deserialization
Deserialization consumes tokens in the same order.
func deserialize(data string) *Node {
tokens := strings.Fields(data)
index := 0
var build func() *Node
build = func() *Node {
if index >= len(tokens) {
return nil
}
token := tokens[index]
index++
if token == "#" {
return nil
}
value, err := strconv.Atoi(token)
if err != nil {
panic(err)
}
return &Node{
Value: value,
Left: build(),
Right: build(),
}
}
return build()
}
The important detail is that the reader and writer must agree on the traversal order and null marker convention.
Discussion
Serialization is not just a formatting problem. It is a structural problem.
If you serialize only the node values:
1 2 3 4
you lose the shape of the tree.
Many different trees can produce the same value sequence.
For example:
1
/
2
/
3
and:
1
\\n 2
\\n 3
both contain the same values but have different structures.
A correct serialization must preserve both values and topology.
Preorder with Null Markers
Preorder with null markers is usually the simplest fully reconstructible binary-tree encoding.
The traversal order is:
node
left subtree
right subtree
The null markers tell the deserializer when a subtree ends.
For this tree:
1
/
2
serialization is:
1 2 # # #
That means:
1
left = 2
2.left = nil
2.right = nil
1.right = nil
No ambiguity remains.
Level-Order Serialization
Breadth-first serialization is also common, especially in coding interview platforms and JSON-like APIs.
For the tree:
1
/ \\n 2 3
/
4
level-order serialization may be:
1 2 3 # # 4 #
A simple serializer:
func serializeLevel(root *Node) string {
if root == nil {
return ""
}
queue := []*Node{root}
parts := []string{}
for len(queue) > 0 {
node := queue[0]
queue = queue[1:]
if node == nil {
parts = append(parts, "#")
continue
}
parts = append(parts, strconv.Itoa(node.Value))
queue = append(queue, node.Left)
queue = append(queue, node.Right)
}
return strings.Join(parts, " ")
}
This representation mirrors the way complete binary trees are stored, but it may contain many null markers for sparse trees.
Trimming Redundant Nulls
Level-order encodings often end with unnecessary null markers.
Example:
1 2 3 # # # #
can be shortened to:
1 2 3
if the format defines missing trailing children as null.
Trimming reduces output size:
for len(parts) > 0 &&
parts[len(parts)-1] == "#" {
parts = parts[:len(parts)-1]
}
This is useful for transport formats, but it must be documented. A deserializer that expects explicit nulls may reject the shortened form.
N-ary Tree Serialization
For general rooted trees, each node may have any number of children.
One common encoding stores:
value child_count children...
Example tree:
1
├── 2
├── 3
│ └── 5
└── 4
Serialization:
1 3 2 0 3 1 5 0 4 0
Meaning:
1 has 3 children
2 has 0 children
3 has 1 child
5 has 0 children
4 has 0 children
Implementation:
type NNode struct {
Value int
Children []*NNode
}
func serializeNary(root *NNode) string {
parts := []string{}
var walk func(*NNode)
walk = func(node *NNode) {
if node == nil {
return
}
parts = append(
parts,
strconv.Itoa(node.Value),
strconv.Itoa(len(node.Children)),
)
for _, child := range node.Children {
walk(child)
}
}
walk(root)
return strings.Join(parts, " ")
}
This format is compact and unambiguous.
Deserializing N-ary Trees
The reader consumes a value, then a child count, then recursively builds that many children.
func deserializeNary(data string) *NNode {
tokens := strings.Fields(data)
index := 0
var build func() *NNode
build = func() *NNode {
value, err := strconv.Atoi(tokens[index])
if err != nil {
panic(err)
}
index++
count, err := strconv.Atoi(tokens[index])
if err != nil {
panic(err)
}
index++
node := &NNode{
Value: value,
Children: make([]*NNode, 0, count),
}
for i := 0; i < count; i++ {
node.Children = append(node.Children, build())
}
return node
}
if len(tokens) == 0 {
return nil
}
return build()
}
The child count replaces the need for null markers.
Parent Array Serialization
For indexed trees, a parent array is often the simplest format.
Example:
parent[0] = -1
parent[1] = 0
parent[2] = 0
parent[3] = 1
parent[4] = 1
This encodes:
0
├── 1
│ ├── 3
│ └── 4
└── 2
Parent arrays are excellent when nodes already have stable integer IDs.
They are less suitable when nodes are pointer-based objects without natural IDs.
Edge List Serialization
For unrooted trees, store edges.
0 1
0 2
1 3
1 4
This representation is common in graph input formats.
It preserves connectivity but not a chosen root.
If the algorithm needs a rooted tree, choose a root during deserialization and run DFS or BFS to orient the edges.
JSON Serialization
Production systems often use JSON.
A binary tree might be encoded as:
{
"value": 1,
"left": {
"value": 2,
"left": null,
"right": null
},
"right": {
"value": 3,
"left": null,
"right": null
}
}
This is verbose but readable.
A compact N-ary tree might use:
{
"value": 1,
"children": [
{ "value": 2, "children": [] },
{ "value": 3, "children": [] }
]
}
JSON is a good interchange format when readability and interoperability matter more than minimal size.
Binary Formats
For large trees, text formats may become too expensive.
Binary formats can reduce:
storage size
parse time
allocation cost
network bandwidth
A binary encoding typically stores:
node value
child count
child payloads
or:
node table
edge table
metadata table
The design depends on whether fast streaming, random access, or compact storage matters most.
Serialization and Canonicalization
Serialization preserves a particular representation.
Canonicalization preserves a normalized representation.
The distinction matters.
Two isomorphic trees may serialize differently if child order differs.
Canonical encoding sorts child encodings before writing them.
Plain serialization usually preserves the original child order.
Use serialization when reconstruction matters.
Use canonicalization when comparison matters.
Complexity Analysis
Let n be the number of nodes.
| Operation | Complexity |
|---|---|
| Preorder serialization | O(n) |
| Preorder deserialization | O(n) |
| Level-order serialization | O(n) |
| N-ary serialization | O(n) |
| Parent array reconstruction | O(n) |
| Edge list reconstruction | O(n) |
Memory usage is also O(n) for the output and reconstructed structure.
Common Mistakes
A common mistake is serializing values without structure. Traversal order alone usually cannot reconstruct a tree.
Another mistake is using a delimiter that can appear inside values. For strings, either escape values carefully or use a structured format such as JSON.
A third mistake is failing to validate input during deserialization. Malformed data can cause index errors, nil dereferences, or infinite recursion.
A fourth mistake is assuming recursive deserialization is safe for extremely deep trees. For adversarial inputs, prefer iterative parsing or enforce depth limits.
Finally, do not confuse stable node IDs with traversal positions. A preorder index changes if the tree changes. A persistent ID should be stored explicitly.
Takeaway
Tree serialization converts hierarchical structure into a linear representation. Binary trees often use preorder traversal with null markers or level-order traversal. N-ary trees commonly store child counts. Indexed trees can use parent arrays or edge lists. The best format depends on whether you need compact storage, easy reconstruction, human readability, stable IDs, or canonical comparison.