9.2 Adjacency Lists
You need a graph representation that supports efficient traversal, scales to large datasets, and works naturally with algorithms such as depth-first search, breadth-first search, topological sorting,...
9.2 Adjacency Lists
Problem
You need a graph representation that supports efficient traversal, scales to large datasets, and works naturally with algorithms such as depth-first search, breadth-first search, topological sorting, shortest paths, and connected component analysis.
Most real-world graphs are sparse. A road network contains far fewer roads than the total number of possible city-to-city connections. A social network user knows only a tiny fraction of all users. A source file imports only a handful of modules compared to the entire codebase.
In these situations, storing every possible connection wastes memory and often reduces performance.
Solution
Represent each vertex by a collection of its neighboring vertices.
This representation is called an adjacency list.
For a directed graph:
A → B
A → C
B → D
C → D
Store the graph as:
graph = {
"A": ["B", "C"],
"B": ["D"],
"C": ["D"],
"D": []
}
Each key identifies a vertex. The associated list contains all vertices reachable through outgoing edges.
To iterate over the neighbors of a vertex:
for neighbor in graph["A"]:
print(neighbor)
Output:
B
C
This operation runs in time proportional to the number of neighbors rather than the total number of vertices in the graph.
Discussion
The adjacency list is the default representation for most graph algorithms because it stores exactly the information that traversal algorithms need.
Consider a graph with one million vertices and five million edges.
An adjacency matrix would require space for every possible pair of vertices:
1,000,000 × 1,000,000
This requires approximately one trillion entries.
An adjacency list stores only existing edges:
1,000,000 vertices
5,000,000 edges
The difference is enormous.
For sparse graphs, adjacency lists provide both lower memory usage and faster traversal.
Directed Graphs
In a directed graph, store only outgoing edges.
Given:
A → B
A → C
B → D
The representation becomes:
graph = {
"A": ["B", "C"],
"B": ["D"],
"C": [],
"D": []
}
Notice that D appears even though it has no outgoing edges.
This is an important implementation detail. Every vertex should appear in the structure regardless of its degree.
A common bug is accidentally omitting sink vertices:
graph = {
"A": ["B"],
"B": ["C"]
}
Now vertex C exists logically but not physically in the graph representation. Traversal code may fail or produce incomplete results.
A safer construction pattern is:
def add_edge(graph, source, target):
graph.setdefault(source, []).append(target)
graph.setdefault(target, [])
This guarantees that both endpoints exist.
Undirected Graphs
For undirected graphs, store each edge twice.
Given:
A — B
A — C
B — D
Store:
graph = {
"A": ["B", "C"],
"B": ["A", "D"],
"C": ["A"],
"D": ["B"]
}
Each endpoint records the other.
A helper function simplifies insertion:
def add_edge(graph, u, v):
graph.setdefault(u, []).append(v)
graph.setdefault(v, []).append(u)
The duplicated storage increases memory usage slightly but dramatically simplifies traversal algorithms.
Building an Adjacency List from Edge Data
Many datasets arrive as edge pairs.
Suppose the input contains:
edges = [
("A", "B"),
("A", "C"),
("B", "D"),
("C", "D")
]
Convert them into an adjacency list:
graph = {}
for source, target in edges:
graph.setdefault(source, []).append(target)
graph.setdefault(target, [])
Result:
{
"A": ["B", "C"],
"B": ["D"],
"C": ["D"],
"D": []
}
This construction runs in:
O(E)
where E is the number of edges.
No additional passes are required.
Recipe: Graph Traversal with Adjacency Lists
A major advantage of adjacency lists is that traversal becomes straightforward.
Depth-first search:
def dfs(graph, vertex, visited):
if vertex in visited:
return
visited.add(vertex)
for neighbor in graph[vertex]:
dfs(graph, neighbor, visited)
Breadth-first search:
from collections import deque
def bfs(graph, start):
visited = {start}
queue = deque([start])
while queue:
vertex = queue.popleft()
for neighbor in graph[vertex]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
Both algorithms spend time only on vertices and edges that actually exist.
This property leads directly to their familiar complexity:
O(V + E)
where:
Vis the number of verticesEis the number of edges
Weighted Adjacency Lists
Many algorithms require edge weights.
Instead of storing only the destination, store destination-weight pairs.
Suppose:
A → B (5)
A → C (2)
B → D (7)
Represent it as:
graph = {
"A": [("B", 5), ("C", 2)],
"B": [("D", 7)],
"C": [],
"D": []
}
Traversal becomes:
for neighbor, weight in graph["A"]:
print(neighbor, weight)
Output:
B 5
C 2
Algorithms such as Dijkstra, Prim, Bellman-Ford, and A* commonly use this representation.
Complexity Characteristics
Let:
V= number of verticesE= number of edges
An adjacency list provides the following performance characteristics:
| Operation | Complexity |
|---|---|
| Enumerate neighbors | O(degree(v)) |
| Traverse graph | O(V + E) |
| Memory usage | O(V + E) |
| Add edge | O(1) amortized |
| Check edge existence | O(degree(v)) |
The final row deserves attention.
Testing whether an edge exists:
A → B ?
requires scanning the neighbor list of A.
This operation is not constant time.
If edge existence queries dominate the workload, another representation may be preferable.
Common Pitfalls
Do not omit vertices with zero outgoing edges.
Do not forget to insert both directions for undirected graphs.
Do not use lists when repeated edge-existence lookups dominate execution time.
Do not accidentally duplicate edges unless the graph model allows parallel edges.
Do not assume neighbor ordering has meaning. Unless explicitly sorted, insertion order rarely corresponds to any graph property.
Do not modify adjacency lists while traversing them. This often introduces skipped vertices or inconsistent behavior.
Recipe: Adjacency Lists for Large Graphs
When graphs become large, numeric identifiers are usually preferable to strings.
Instead of:
graph = {
"new_york": ["boston", "chicago"]
}
Use:
graph = [
[1, 2],
[0, 3],
[0],
[1]
]
Here the vertex ID is the array index.
Benefits include:
- Lower memory consumption
- Better cache locality
- Faster traversal
- Simpler serialization
Competitive programming systems, graph databases, search engines, and routing engines commonly use integer vertex identifiers internally.
Takeaway
Adjacency lists are the standard representation for sparse graphs because they store only existing edges and support efficient traversal. Most fundamental graph algorithms, including DFS, BFS, topological sorting, shortest paths, connected components, and spanning tree algorithms, are naturally expressed using adjacency lists. Understanding this representation thoroughly is essential because nearly every graph algorithm in the remainder of the book assumes it as the underlying data structure.