19.7 Skip Lists Revisited
Balanced search trees achieve logarithmic performance by carefully maintaining structural invariants.
19.7 Skip Lists Revisited
Balanced search trees achieve logarithmic performance by carefully maintaining structural invariants. AVL trees perform rotations to preserve height balance. Red-black trees maintain coloring rules. B-trees redistribute keys across nodes. These structures provide excellent asymptotic guarantees, but the implementation complexity can be substantial.
Skip lists take a different approach. Instead of maintaining strict balance through deterministic rules, they use randomness. A skip list is essentially a hierarchy of linked lists where randomly selected elements appear in higher levels. This simple idea produces expected logarithmic performance while avoiding rotations, recoloring, and many of the implementation challenges associated with balanced trees.
The result is one of the most elegant examples of randomized data structure design.
Problem
Maintain a collection of ordered values supporting:
- Search
- Insert
- Delete
- Range queries
Desired complexity:
Search O(log n)
Insert O(log n)
Delete O(log n)
A sorted linked list provides efficient insertion and deletion:
Insert O(1)
Delete O(1)
after locating the position.
Unfortunately, search remains:
O(n)
because traversal proceeds one node at a time.
Can randomness improve search performance without requiring complex balancing logic?
Solution
Add shortcuts.
Suppose we start with a sorted linked list:
1 → 3 → 5 → 7 → 9 → 11 → 13
Searching for:
11
requires scanning multiple nodes.
Now add a second level:
1 --------→ 7 --------→ 13
| | |
1 → 3 → 5 → 7 → 9 → 11 → 13
The upper level allows larger jumps.
Searching for:
11
becomes much faster.
A skip list generalizes this idea by adding multiple levels.
Structure of a Skip List
Each node has:
class Node:
def __init__(self, value, level):
self.value = value
self.forward = [None] * level
A node may appear in:
- Level 1
- Levels 1 and 2
- Levels 1, 2, and 3
- Higher levels
The height is chosen randomly.
Most nodes remain short.
A few nodes become tall.
Very few become extremely tall.
Example Layout
Consider:
1 3 5 7 9 11 13 15
A possible skip list:
Level 3
1 --------------------------→ 15
Level 2
1 --------→ 7 ------------→ 15
Level 1
1 → 3 → 5 → 7 → 9 → 11 → 13 → 15
Each higher level acts as an express lane.
Searches move horizontally when possible and vertically when necessary.
Searching
Suppose we search for:
11
Begin at the highest level.
1 --------------------------→ 15
Moving directly to:
15
would overshoot.
Drop down one level.
1 --------→ 7 ------------→ 15
Move to:
7
The next jump overshoots.
Drop down again.
7 → 9 → 11
Target found.
Instead of scanning every node, we traverse only a few levels.
Search Algorithm
def search(self, value):
current = self.head
for level in reversed(range(self.levels)):
while (
current.forward[level]
and current.forward[level].value < value
):
current = current.forward[level]
current = current.forward[0]
return (
current is not None
and current.value == value
)
The algorithm repeatedly:
- Move right.
- Drop down.
- Continue.
This resembles navigating a highway system.
Random Height Assignment
The most important component is level selection.
A common strategy:
import random
def random_level():
level = 1
while random.random() < 0.5:
level += 1
return level
Each level promotion occurs with probability:
1/2
Expected distribution:
| Height | Fraction of Nodes |
|---|---|
| 1 | 1/2 |
| 2 | 1/4 |
| 3 | 1/8 |
| 4 | 1/16 |
| 5 | 1/32 |
The number of nodes decreases exponentially with height.
This creates a natural hierarchy.
Why Random Heights Work
Suppose:
n = 1,000,000
Expected nodes at each level:
| Level | Expected Nodes |
|---|---|
| 1 | 1,000,000 |
| 2 | 500,000 |
| 3 | 250,000 |
| 4 | 125,000 |
| 5 | 62,500 |
| 10 | 1,953 |
| 20 | 1 |
Only a handful of nodes reach the highest levels.
Those nodes provide long-distance shortcuts.
The resulting structure behaves similarly to a balanced search tree.
Insertion
Insertion consists of two phases.
Phase 1
Locate insertion positions.
Example:
Insert 10
Search identifies predecessor nodes at every level.
Phase 2
Generate a random height.
Example:
Height = 3
Create a new node.
Update pointers.
The node becomes part of:
Level 1
Level 2
Level 3
No rotations are required.
No balancing operations occur.
Randomness maintains structure quality automatically.
Deletion
Deletion mirrors insertion.
Locate predecessors.
Remove references at each level.
for level in range(node_height):
update[level].forward[level] = (
node.forward[level]
)
The operation affects only local pointers.
Again, no rebalancing is required.
Expected Complexity
The key result:
| Operation | Expected Cost |
|---|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
These bounds are expected values.
Occasionally, random heights produce a poor structure.
Such events become increasingly unlikely as the data structure grows.
Intuition Behind Logarithmic Search
Suppose half the nodes appear at level 2.
A quarter appear at level 3.
An eighth appear at level 4.
Each level reduces the number of nodes that must be examined.
The search path therefore resembles binary search.
Instead of dividing an array in half repeatedly:
n
n/2
n/4
n/8
...
the skip list divides the search space through increasingly sparse levels.
The result is logarithmic behavior.
Memory Usage
A node stores:
Value
Forward pointers
Expected number of pointers per node:
2
when promotion probability equals:
1/2
Total expected memory:
O(n)
This is comparable to balanced trees.
Comparison with AVL Trees
| Property | AVL Tree | Skip List |
|---|---|---|
| Search | O(log n) | Expected O(log n) |
| Insert | O(log n) | Expected O(log n) |
| Delete | O(log n) | Expected O(log n) |
| Rotations | Required | None |
| Implementation | Complex | Simple |
| Deterministic | Yes | No |
Skip lists exchange deterministic guarantees for implementation simplicity.
Comparison with Red-Black Trees
| Property | Red-Black Tree | Skip List |
|---|---|---|
| Balancing | Explicit | Probabilistic |
| Rotations | Yes | No |
| Coloring Rules | Yes | No |
| Search | O(log n) | Expected O(log n) |
| Code Complexity | Moderate | Low |
This simplicity explains their popularity in some systems.
Real-World Applications
Skip lists appear in several production systems.
Databases
Ordered indexes.
In-Memory Stores
Fast ordered maps.
Distributed Systems
Membership structures.
Concurrent Data Structures
Lock-free skip lists.
Concurrent implementations are often easier than balanced tree equivalents because local pointer updates are simpler than tree rotations.
Probabilistic Analysis
Let:
p
be the promotion probability.
Expected nodes at level:
k
equal:
n × p^(k−1)
The tallest level therefore has height approximately:
log₁₍₁⁄p₎ n
For:
p = 1/2
height becomes:
log₂ n
which explains the logarithmic search complexity.
Common Mistakes
Choosing Heights Deterministically
Random heights are essential.
Without randomness, the structure can degenerate.
Using Excessive Heights
Very tall nodes increase memory without improving performance significantly.
Poor Random Generators
Biased height distributions can affect balance quality.
Ignoring Concurrent Updates
Pointer updates must be coordinated carefully in multithreaded implementations.
Discussion
Skip lists demonstrate a recurring theme in randomized algorithms: randomness can replace complex maintenance logic. Balanced trees achieve logarithmic performance through carefully enforced invariants. Skip lists achieve similar expected performance by allowing randomness to construct those invariants implicitly.
The elegance of the design lies in its simplicity. Search follows a straightforward pattern. Insertions and deletions update only local pointers. No rotations, recoloring operations, or restructuring passes are required. Yet the resulting data structure behaves much like a balanced tree.
This combination of simplicity and performance makes skip lists one of the classic examples of probabilistic algorithm design. The next section examines randomized hashing, where randomness is used not to build search structures but to reduce collisions and improve expected lookup performance in hash-based data structures.