19.7 Skip Lists Revisited

Balanced search trees achieve logarithmic performance by carefully maintaining structural invariants.

19.7 Skip Lists Revisited

Balanced search trees achieve logarithmic performance by carefully maintaining structural invariants. AVL trees perform rotations to preserve height balance. Red-black trees maintain coloring rules. B-trees redistribute keys across nodes. These structures provide excellent asymptotic guarantees, but the implementation complexity can be substantial.

Skip lists take a different approach. Instead of maintaining strict balance through deterministic rules, they use randomness. A skip list is essentially a hierarchy of linked lists where randomly selected elements appear in higher levels. This simple idea produces expected logarithmic performance while avoiding rotations, recoloring, and many of the implementation challenges associated with balanced trees.

The result is one of the most elegant examples of randomized data structure design.

Problem

Maintain a collection of ordered values supporting:

  • Search
  • Insert
  • Delete
  • Range queries

Desired complexity:

Search  O(log n)
Insert  O(log n)
Delete  O(log n)

A sorted linked list provides efficient insertion and deletion:

Insert  O(1)
Delete  O(1)

after locating the position.

Unfortunately, search remains:

O(n)

because traversal proceeds one node at a time.

Can randomness improve search performance without requiring complex balancing logic?

Solution

Add shortcuts.

Suppose we start with a sorted linked list:

1 → 3 → 5 → 7 → 9 → 11 → 13

Searching for:

11

requires scanning multiple nodes.

Now add a second level:

1 --------→ 7 --------→ 13
|           |           |
1 → 3 → 5 → 7 → 9 → 11 → 13

The upper level allows larger jumps.

Searching for:

11

becomes much faster.

A skip list generalizes this idea by adding multiple levels.

Structure of a Skip List

Each node has:

class Node:
    def __init__(self, value, level):
        self.value = value
        self.forward = [None] * level

A node may appear in:

  • Level 1
  • Levels 1 and 2
  • Levels 1, 2, and 3
  • Higher levels

The height is chosen randomly.

Most nodes remain short.

A few nodes become tall.

Very few become extremely tall.

Example Layout

Consider:

1 3 5 7 9 11 13 15

A possible skip list:

Level 3

1 --------------------------→ 15

Level 2

1 --------→ 7 ------------→ 15

Level 1

1 → 3 → 5 → 7 → 9 → 11 → 13 → 15

Each higher level acts as an express lane.

Searches move horizontally when possible and vertically when necessary.

Searching

Suppose we search for:

11

Begin at the highest level.

1 --------------------------→ 15

Moving directly to:

15

would overshoot.

Drop down one level.

1 --------→ 7 ------------→ 15

Move to:

7

The next jump overshoots.

Drop down again.

7 → 9 → 11

Target found.

Instead of scanning every node, we traverse only a few levels.

Search Algorithm

def search(self, value):
    current = self.head

    for level in reversed(range(self.levels)):
        while (
            current.forward[level]
            and current.forward[level].value < value
        ):
            current = current.forward[level]

    current = current.forward[0]

    return (
        current is not None
        and current.value == value
    )

The algorithm repeatedly:

  1. Move right.
  2. Drop down.
  3. Continue.

This resembles navigating a highway system.

Random Height Assignment

The most important component is level selection.

A common strategy:

import random

def random_level():
    level = 1

    while random.random() < 0.5:
        level += 1

    return level

Each level promotion occurs with probability:

1/2

Expected distribution:

Height Fraction of Nodes
1 1/2
2 1/4
3 1/8
4 1/16
5 1/32

The number of nodes decreases exponentially with height.

This creates a natural hierarchy.

Why Random Heights Work

Suppose:

n = 1,000,000

Expected nodes at each level:

Level Expected Nodes
1 1,000,000
2 500,000
3 250,000
4 125,000
5 62,500
10 1,953
20 1

Only a handful of nodes reach the highest levels.

Those nodes provide long-distance shortcuts.

The resulting structure behaves similarly to a balanced search tree.

Insertion

Insertion consists of two phases.

Phase 1

Locate insertion positions.

Example:

Insert 10

Search identifies predecessor nodes at every level.

Phase 2

Generate a random height.

Example:

Height = 3

Create a new node.

Update pointers.

The node becomes part of:

Level 1
Level 2
Level 3

No rotations are required.

No balancing operations occur.

Randomness maintains structure quality automatically.

Deletion

Deletion mirrors insertion.

Locate predecessors.

Remove references at each level.

for level in range(node_height):
    update[level].forward[level] = (
        node.forward[level]
    )

The operation affects only local pointers.

Again, no rebalancing is required.

Expected Complexity

The key result:

Operation Expected Cost
Search O(log n)
Insert O(log n)
Delete O(log n)

These bounds are expected values.

Occasionally, random heights produce a poor structure.

Such events become increasingly unlikely as the data structure grows.

Suppose half the nodes appear at level 2.

A quarter appear at level 3.

An eighth appear at level 4.

Each level reduces the number of nodes that must be examined.

The search path therefore resembles binary search.

Instead of dividing an array in half repeatedly:

n
n/2
n/4
n/8
...

the skip list divides the search space through increasingly sparse levels.

The result is logarithmic behavior.

Memory Usage

A node stores:

Value
Forward pointers

Expected number of pointers per node:

2

when promotion probability equals:

1/2

Total expected memory:

O(n)

This is comparable to balanced trees.

Comparison with AVL Trees

Property AVL Tree Skip List
Search O(log n) Expected O(log n)
Insert O(log n) Expected O(log n)
Delete O(log n) Expected O(log n)
Rotations Required None
Implementation Complex Simple
Deterministic Yes No

Skip lists exchange deterministic guarantees for implementation simplicity.

Comparison with Red-Black Trees

Property Red-Black Tree Skip List
Balancing Explicit Probabilistic
Rotations Yes No
Coloring Rules Yes No
Search O(log n) Expected O(log n)
Code Complexity Moderate Low

This simplicity explains their popularity in some systems.

Real-World Applications

Skip lists appear in several production systems.

Databases

Ordered indexes.

In-Memory Stores

Fast ordered maps.

Distributed Systems

Membership structures.

Concurrent Data Structures

Lock-free skip lists.

Concurrent implementations are often easier than balanced tree equivalents because local pointer updates are simpler than tree rotations.

Probabilistic Analysis

Let:

p

be the promotion probability.

Expected nodes at level:

k

equal:

n × p^(k−1)

The tallest level therefore has height approximately:

log₁₍₁⁄p₎ n

For:

p = 1/2

height becomes:

log₂ n

which explains the logarithmic search complexity.

Common Mistakes

Choosing Heights Deterministically

Random heights are essential.

Without randomness, the structure can degenerate.

Using Excessive Heights

Very tall nodes increase memory without improving performance significantly.

Poor Random Generators

Biased height distributions can affect balance quality.

Ignoring Concurrent Updates

Pointer updates must be coordinated carefully in multithreaded implementations.

Discussion

Skip lists demonstrate a recurring theme in randomized algorithms: randomness can replace complex maintenance logic. Balanced trees achieve logarithmic performance through carefully enforced invariants. Skip lists achieve similar expected performance by allowing randomness to construct those invariants implicitly.

The elegance of the design lies in its simplicity. Search follows a straightforward pattern. Insertions and deletions update only local pointers. No rotations, recoloring operations, or restructuring passes are required. Yet the resulting data structure behaves much like a balanced tree.

This combination of simplicity and performance makes skip lists one of the classic examples of probabilistic algorithm design. The next section examines randomized hashing, where randomness is used not to build search structures but to reduce collisions and improve expected lookup performance in hash-based data structures.