7.18 Skip Lists

Balanced trees provide logarithmic performance through carefully maintained structure.

7.18 Skip Lists

Balanced trees provide logarithmic performance through carefully maintained structure. AVL trees track heights. Red-black trees track colors. B-trees manage node occupancy. All of these structures require specialized balancing logic.

Skip lists achieve similar performance with a radically different idea.

Instead of maintaining a balanced tree, a skip list maintains multiple linked lists layered on top of one another. Higher levels contain progressively fewer elements, allowing searches to skip large portions of the data.

The result is a surprisingly simple probabilistic data structure that offers expected logarithmic performance while being much easier to implement than many balanced trees.

Skip lists are widely used in databases, caches, distributed systems, concurrent data structures, and storage engines.

Problem

Suppose values are stored in a sorted linked list.

10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70

Searching for:

60

requires:

10
20
30
40
50
60

Six comparisons.

Complexity:

O(n)

The list preserves ordering but provides no efficient way to skip ahead.

We need a structure that maintains linked-list simplicity while improving search performance.

Solution

Add shortcut layers.

Level 0 contains every value.

10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70

Level 1 contains selected values.

10 ------> 30 ------> 50 ------> 70

Level 2 contains even fewer values.

10 -----------------> 50

Searching starts at the highest level.

Large portions of the list are skipped immediately.

The structure resembles express lanes on a highway.

Visualizing a Skip List

Example:

Level 2:
10 ----------------------> 50

Level 1:
10 ------> 30 ------> 50 ------> 70

Level 0:
10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70

Search for:

60

Path:

10
50
60

Rather than visiting every node, the search jumps across large sections.

This produces logarithmic expected behavior.

Node Structure

A skip-list node stores multiple forward pointers.

type Node struct {
	Key     int
	Forward []*Node
}

Example:

Node 50

Level 2 -> nil
Level 1 -> 70
Level 0 -> 60

Each node participates in one or more levels.

The number of levels varies per node.

Levels and Probability

When a new node is inserted, its height is chosen randomly.

Example probabilities:

Level 1: 100%
Level 2: 50%
Level 3: 25%
Level 4: 12.5%

Each additional level is obtained by flipping a coin.

Heads -> add another level
Tails -> stop

This creates a pyramid-like structure naturally.

Most nodes appear only at the bottom.

A few appear at higher levels.

Very few reach the top.

Why Random Heights Work

Suppose:

1,000,000 nodes

Expected counts:

Level 1: 1,000,000
Level 2:   500,000
Level 3:   250,000
Level 4:   125,000

and so on.

Each level contains roughly half as many nodes as the previous one.

This mirrors the search-space reduction achieved by balanced trees.

The structure emerges automatically from randomness rather than explicit balancing.

Searching

Search begins at the highest level.

Consider:

Level 2:
10 -----------------> 50

Level 1:
10 ------> 30 ------> 50 ------> 70

Level 0:
10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70

Search for:

60

Start at:

10

Move right until moving further would exceed the target.

50

Drop one level.

50

Move right.

60

Found.

The search repeatedly moves:

right
down
right
down

until the target is located.

Search Algorithm

Simplified implementation:

func Search(
	head *Node,
	key int,
	maxLevel int,
) bool {

	current := head

	for level := maxLevel - 1;
		level >= 0;
		level-- {

		for current.Forward[level] != nil &&
			current.Forward[level].Key < key {

			current =
				current.Forward[level]
		}
	}

	current = current.Forward[0]

	return current != nil &&
		current.Key == key
}

The algorithm resembles traversing a grid.

Horizontal movement skips values.

Vertical movement refines the search.

Insertion

Insertion has two phases.

Phase 1

Locate insertion positions at every level.

Example:

10 -> 20 -> 40 -> 50

Insert:

30

Search determines where the new node belongs.

Phase 2

Generate a random level.

Example:

Level 2

Insert the node into:

Level 0
Level 1

Pointer updates connect the new node into each level.

No rotations are required.

No balance factors are computed.

No colors are tracked.

Example Insertion

Initial:

Level 1:
10 ------> 50

Level 0:
10 -> 20 -> 50

Insert:

30

Randomly assigned:

Level 2

Result:

Level 1:
10 -> 30 -> 50

Level 0:
10 -> 20 -> 30 -> 50

The structure remains ordered automatically.

Deletion

Deletion is similarly straightforward.

Locate the node.

Update pointers at every level where the node appears.

Example:

Level 1:
10 -> 30 -> 50

Level 0:
10 -> 20 -> 30 -> 50

Delete:

30

Result:

Level 1:
10 ------> 50

Level 0:
10 -> 20 -> 50

No rebalancing is required.

Randomization handles long-term structure.

A useful intuition is that each higher level acts as a progressively coarser index.

Level 3
Level 2
Level 1
Level 0

This resembles repeatedly halving the search space.

A balanced tree does this vertically.

A skip list does it horizontally.

Both achieve:

O(log n)

expected search performance.

Concurrent Systems

One reason skip lists are popular is concurrency.

Rotations in balanced trees can be difficult to coordinate across multiple threads.

Skip lists require mostly local pointer updates.

This makes lock-free and highly concurrent implementations easier.

Several high-performance systems exploit this property.

Examples include:

  • Concurrent key-value stores
  • In-memory databases
  • Scheduling systems
  • Distributed coordination services

Skip Lists in Databases

Several storage engines use skip lists as in-memory indexes.

Reasons include:

  • Simple implementation
  • Good cache behavior
  • Easy concurrency support
  • Expected logarithmic performance

A skip list often serves as the mutable in-memory layer while persistent data resides in B-trees or sorted files.

This combination appears in modern LSM-tree storage engines.

Skip Lists Versus AVL Trees

Feature Skip List AVL Tree
Balance Randomized Deterministic
Rotations None Required
Search Expected O(log n) O(log n)
Insert Expected O(log n) O(log n)
Delete Expected O(log n) O(log n)
Implementation Simple Moderate

AVL trees provide stronger guarantees.

Skip lists provide simpler code.

Skip Lists Versus Red-Black Trees

Feature Skip List Red-Black Tree
Balancing Randomized Color rules
Height maintenance Automatic Explicit
Worst-case guarantee No Yes
Concurrency Excellent More difficult
Code complexity Lower Higher

Many engineers choose skip lists specifically because of implementation simplicity.

Skip Lists Versus B-Trees

Feature Skip List B-Tree
Storage optimization Moderate Excellent
Memory usage Higher Lower
Disk access efficiency Moderate Excellent
Range scans Good Excellent
In-memory workloads Excellent Good

B-trees dominate storage indexes.

Skip lists excel as in-memory structures.

Expected Complexity

With probability:

p = 0.5

expected height becomes:

O(log n)

Expected operations:

Operation Complexity
Search O(log n)
Insert O(log n)
Delete O(log n)

Space complexity:

O(n)

with a modest constant-factor overhead for additional pointers.

Common Mistakes

A common misconception is that skip lists are balanced trees. They are not trees at all. They are layered linked lists.

Another mistake is expecting strict worst-case guarantees. Skip lists provide expected performance through probability rather than deterministic balancing.

A third mistake is using poor random-number generation. The level distribution directly affects performance.

Finally, some implementations choose a maximum level that is too small. The upper levels must be large enough to support the expected collection size.

Takeaway

Skip lists achieve expected logarithmic performance through probabilistic layering rather than explicit balancing. By augmenting a sorted linked list with progressively sparser shortcut levels, they create a structure that behaves similarly to a balanced search tree while remaining conceptually simple. Their ease of implementation, strong practical performance, and suitability for concurrent environments have made them a popular alternative to traditional balanced trees in modern systems software.