7.18 Skip Lists
Balanced trees provide logarithmic performance through carefully maintained structure.
7.18 Skip Lists
Balanced trees provide logarithmic performance through carefully maintained structure. AVL trees track heights. Red-black trees track colors. B-trees manage node occupancy. All of these structures require specialized balancing logic.
Skip lists achieve similar performance with a radically different idea.
Instead of maintaining a balanced tree, a skip list maintains multiple linked lists layered on top of one another. Higher levels contain progressively fewer elements, allowing searches to skip large portions of the data.
The result is a surprisingly simple probabilistic data structure that offers expected logarithmic performance while being much easier to implement than many balanced trees.
Skip lists are widely used in databases, caches, distributed systems, concurrent data structures, and storage engines.
Problem
Suppose values are stored in a sorted linked list.
10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70
Searching for:
60
requires:
10
20
30
40
50
60
Six comparisons.
Complexity:
O(n)
The list preserves ordering but provides no efficient way to skip ahead.
We need a structure that maintains linked-list simplicity while improving search performance.
Solution
Add shortcut layers.
Level 0 contains every value.
10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70
Level 1 contains selected values.
10 ------> 30 ------> 50 ------> 70
Level 2 contains even fewer values.
10 -----------------> 50
Searching starts at the highest level.
Large portions of the list are skipped immediately.
The structure resembles express lanes on a highway.
Visualizing a Skip List
Example:
Level 2:
10 ----------------------> 50
Level 1:
10 ------> 30 ------> 50 ------> 70
Level 0:
10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70
Search for:
60
Path:
10
50
60
Rather than visiting every node, the search jumps across large sections.
This produces logarithmic expected behavior.
Node Structure
A skip-list node stores multiple forward pointers.
type Node struct {
Key int
Forward []*Node
}
Example:
Node 50
Level 2 -> nil
Level 1 -> 70
Level 0 -> 60
Each node participates in one or more levels.
The number of levels varies per node.
Levels and Probability
When a new node is inserted, its height is chosen randomly.
Example probabilities:
Level 1: 100%
Level 2: 50%
Level 3: 25%
Level 4: 12.5%
Each additional level is obtained by flipping a coin.
Heads -> add another level
Tails -> stop
This creates a pyramid-like structure naturally.
Most nodes appear only at the bottom.
A few appear at higher levels.
Very few reach the top.
Why Random Heights Work
Suppose:
1,000,000 nodes
Expected counts:
Level 1: 1,000,000
Level 2: 500,000
Level 3: 250,000
Level 4: 125,000
and so on.
Each level contains roughly half as many nodes as the previous one.
This mirrors the search-space reduction achieved by balanced trees.
The structure emerges automatically from randomness rather than explicit balancing.
Searching
Search begins at the highest level.
Consider:
Level 2:
10 -----------------> 50
Level 1:
10 ------> 30 ------> 50 ------> 70
Level 0:
10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70
Search for:
60
Start at:
10
Move right until moving further would exceed the target.
50
Drop one level.
50
Move right.
60
Found.
The search repeatedly moves:
right
down
right
down
until the target is located.
Search Algorithm
Simplified implementation:
func Search(
head *Node,
key int,
maxLevel int,
) bool {
current := head
for level := maxLevel - 1;
level >= 0;
level-- {
for current.Forward[level] != nil &&
current.Forward[level].Key < key {
current =
current.Forward[level]
}
}
current = current.Forward[0]
return current != nil &&
current.Key == key
}
The algorithm resembles traversing a grid.
Horizontal movement skips values.
Vertical movement refines the search.
Insertion
Insertion has two phases.
Phase 1
Locate insertion positions at every level.
Example:
10 -> 20 -> 40 -> 50
Insert:
30
Search determines where the new node belongs.
Phase 2
Generate a random level.
Example:
Level 2
Insert the node into:
Level 0
Level 1
Pointer updates connect the new node into each level.
No rotations are required.
No balance factors are computed.
No colors are tracked.
Example Insertion
Initial:
Level 1:
10 ------> 50
Level 0:
10 -> 20 -> 50
Insert:
30
Randomly assigned:
Level 2
Result:
Level 1:
10 -> 30 -> 50
Level 0:
10 -> 20 -> 30 -> 50
The structure remains ordered automatically.
Deletion
Deletion is similarly straightforward.
Locate the node.
Update pointers at every level where the node appears.
Example:
Level 1:
10 -> 30 -> 50
Level 0:
10 -> 20 -> 30 -> 50
Delete:
30
Result:
Level 1:
10 ------> 50
Level 0:
10 -> 20 -> 50
No rebalancing is required.
Randomization handles long-term structure.
Skip Lists and Binary Search
A useful intuition is that each higher level acts as a progressively coarser index.
Level 3
Level 2
Level 1
Level 0
This resembles repeatedly halving the search space.
A balanced tree does this vertically.
A skip list does it horizontally.
Both achieve:
O(log n)
expected search performance.
Concurrent Systems
One reason skip lists are popular is concurrency.
Rotations in balanced trees can be difficult to coordinate across multiple threads.
Skip lists require mostly local pointer updates.
This makes lock-free and highly concurrent implementations easier.
Several high-performance systems exploit this property.
Examples include:
- Concurrent key-value stores
- In-memory databases
- Scheduling systems
- Distributed coordination services
Skip Lists in Databases
Several storage engines use skip lists as in-memory indexes.
Reasons include:
- Simple implementation
- Good cache behavior
- Easy concurrency support
- Expected logarithmic performance
A skip list often serves as the mutable in-memory layer while persistent data resides in B-trees or sorted files.
This combination appears in modern LSM-tree storage engines.
Skip Lists Versus AVL Trees
| Feature | Skip List | AVL Tree |
|---|---|---|
| Balance | Randomized | Deterministic |
| Rotations | None | Required |
| Search | Expected O(log n) | O(log n) |
| Insert | Expected O(log n) | O(log n) |
| Delete | Expected O(log n) | O(log n) |
| Implementation | Simple | Moderate |
AVL trees provide stronger guarantees.
Skip lists provide simpler code.
Skip Lists Versus Red-Black Trees
| Feature | Skip List | Red-Black Tree |
|---|---|---|
| Balancing | Randomized | Color rules |
| Height maintenance | Automatic | Explicit |
| Worst-case guarantee | No | Yes |
| Concurrency | Excellent | More difficult |
| Code complexity | Lower | Higher |
Many engineers choose skip lists specifically because of implementation simplicity.
Skip Lists Versus B-Trees
| Feature | Skip List | B-Tree |
|---|---|---|
| Storage optimization | Moderate | Excellent |
| Memory usage | Higher | Lower |
| Disk access efficiency | Moderate | Excellent |
| Range scans | Good | Excellent |
| In-memory workloads | Excellent | Good |
B-trees dominate storage indexes.
Skip lists excel as in-memory structures.
Expected Complexity
With probability:
p = 0.5
expected height becomes:
O(log n)
Expected operations:
| Operation | Complexity |
|---|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
Space complexity:
O(n)
with a modest constant-factor overhead for additional pointers.
Common Mistakes
A common misconception is that skip lists are balanced trees. They are not trees at all. They are layered linked lists.
Another mistake is expecting strict worst-case guarantees. Skip lists provide expected performance through probability rather than deterministic balancing.
A third mistake is using poor random-number generation. The level distribution directly affects performance.
Finally, some implementations choose a maximum level that is too small. The upper levels must be large enough to support the expected collection size.
Takeaway
Skip lists achieve expected logarithmic performance through probabilistic layering rather than explicit balancing. By augmenting a sorted linked list with progressively sparser shortcut levels, they create a structure that behaves similarly to a balanced search tree while remaining conceptually simple. Their ease of implementation, strong practical performance, and suitability for concurrent environments have made them a popular alternative to traditional balanced trees in modern systems software.