Serialize an N-ary tree into a string and reconstruct the same tree using preorder traversal with child counts.
Problem Restatement
We are given an N-ary tree.
An N-ary tree is a rooted tree where each node can have zero or more children.
We need to design two methods:
| Method | Meaning |
|---|---|
serialize(root) | Convert the tree into a string |
deserialize(data) | Convert the string back into the original tree |
There is no required serialization format. We can choose any format as long as it can rebuild the same tree structure.
The solution should be stateless. We should not use class-level, global, or static variables to store decoding state. The problem also states that N is in the range [1, 1000].
Input and Output
| Item | Meaning |
|---|---|
Input to serialize | Root of an N-ary tree |
Output from serialize | A string representation of the tree |
Input to deserialize | The serialized string |
Output from deserialize | Root of the reconstructed N-ary tree |
| Empty tree | Serialize as an empty string and deserialize back to None |
The node class usually looks like this:
class Node:
def __init__(self, val=None, children=None):
self.val = val
self.children = children if children is not None else []Examples
Consider this tree:
1
/ | \
3 2 4
/ \
5 6One valid serialization is:
1 3 3 2 5 0 6 0 2 0 4 0This means:
1 has 3 children
3 has 2 children
5 has 0 children
6 has 0 children
2 has 0 children
4 has 0 childrenThe exact format does not matter. What matters is that deserialize(serialize(root)) returns the same tree.
For an empty tree:
root = Nonewe serialize it as:
and deserialize it back to:
NoneFirst Thought: Store Values Only
A first attempt might store only preorder values:
1 3 5 6 2 4This loses structure.
From these values alone, we cannot know whether 5 and 6 are children of 3, or whether they are siblings of 3.
So each node must store two pieces of information:
| Data | Why |
|---|---|
| node value | To reconstruct the node |
| number of children | To know how many recursive child subtrees to read |
Key Insight
Preorder traversal is enough if we store the child count with every node.
For each node, serialize:
value child_countThen recursively serialize each child.
For the example tree:
1
/ | \
3 2 4
/ \
5 6we write:
1 3
3 2
5 0
6 0
2 0
4 0Flattened into one string:
1 3 3 2 5 0 6 0 2 0 4 0During deserialization, we read tokens from left to right.
When we read:
value child_countwe create a node, then recursively read exactly child_count children.
This is what makes the format unambiguous.
Algorithm
For serialization:
- If
rootisNone, return an empty string. - Run preorder DFS.
- For each node, append:
- node value
- number of children
- Join all tokens with spaces.
For deserialization:
- If
datais empty, returnNone. - Split the string into tokens.
- Use an index to read tokens from left to right.
- Read one node:
- first token is value
- second token is child count
- Recursively read that many children.
- Return the node.
The decoding index is local to deserialize, so the solution remains stateless.
Correctness
Serialization writes every node before its children. For each node, it also writes the number of children.
During deserialization, when we read a node value and child count, the child count tells us exactly how many child subtrees belong to that node.
Because serialization writes child subtrees immediately after the parent, deserialization reads them in the same order. The first child written becomes the first child reconstructed, the second child written becomes the second child reconstructed, and so on.
A leaf node has child count 0, so deserialization creates the node and returns immediately. An internal node has positive child count, so deserialization recursively reconstructs exactly that many children.
By induction over the preorder sequence, every serialized subtree is reconstructed with the same root value and the same ordered list of children. Therefore, deserialize(serialize(root)) reconstructs the original N-ary tree.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each node is serialized once and deserialized once |
| Space | O(n) | The serialized token list stores all nodes |
| Recursion stack | O(h) | h is the height of the tree |
Here, n is the number of nodes.
Implementation
class Codec:
def serialize(self, root: 'Node') -> str:
if root is None:
return ""
tokens = []
def dfs(node: 'Node') -> None:
tokens.append(str(node.val))
tokens.append(str(len(node.children)))
for child in node.children:
dfs(child)
dfs(root)
return " ".join(tokens)
def deserialize(self, data: str) -> 'Node':
if not data:
return None
tokens = data.split()
index = 0
def build() -> 'Node':
nonlocal index
val = int(tokens[index])
index += 1
child_count = int(tokens[index])
index += 1
node = Node(val, [])
for _ in range(child_count):
node.children.append(build())
return node
return build()Code Explanation
The serializer handles the empty tree first:
if root is None:
return ""For a non-empty tree, we store tokens in a list:
tokens = []The DFS writes the current node before its children:
tokens.append(str(node.val))
tokens.append(str(len(node.children)))This pair is enough to decode the node later.
Then we serialize every child in order:
for child in node.children:
dfs(child)Finally, we join all tokens:
return " ".join(tokens)The deserializer first handles the empty string:
if not data:
return NoneThen it splits the string:
tokens = data.split()The local variable index tells us which token to read next.
Inside build, we read the node value:
val = int(tokens[index])
index += 1Then we read the number of children:
child_count = int(tokens[index])
index += 1Now we can create the node:
node = Node(val, [])Because child_count tells us exactly how many children to read, we recursively build that many child subtrees:
for _ in range(child_count):
node.children.append(build())The method returns the reconstructed root.
Testing
We can test by serializing a tree, deserializing it, then comparing both trees structurally.
class Node:
def __init__(self, val=None, children=None):
self.val = val
self.children = children if children is not None else []
def same_tree(a, b):
if a is None or b is None:
return a is b
if a.val != b.val:
return False
if len(a.children) != len(b.children):
return False
for x, y in zip(a.children, b.children):
if not same_tree(x, y):
return False
return True
def run_tests():
codec = Codec()
root = Node(1, [
Node(3, [Node(5), Node(6)]),
Node(2),
Node(4),
])
data = codec.serialize(root)
restored = codec.deserialize(data)
assert same_tree(root, restored)
single = Node(10)
assert same_tree(single, codec.deserialize(codec.serialize(single)))
assert codec.serialize(None) == ""
assert codec.deserialize("") is None
chain = Node(1, [Node(2, [Node(3, [Node(4)])])])
assert same_tree(chain, codec.deserialize(codec.serialize(chain)))
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
| Multi-child tree | Checks ordinary N-ary structure |
| Single node | Checks leaf root |
| Empty tree | Checks None behavior |
| Deep chain | Checks recursive nesting |
| Structural comparison | Confirms values and child order are preserved |