9.22 Graph Representation Tradeoffs

You need to choose the right in-memory representation for a graph.

9.22 Graph Representation Tradeoffs

Problem

You need to choose the right in-memory representation for a graph.

Most graph algorithms assume some representation, but the best choice depends on the workload. A graph used for BFS has different needs from a graph used for all-pairs shortest paths. A graph stored on disk has different needs from a graph processed interactively. A sparse graph with billions of edges has different constraints from a dense graph with only a few hundred vertices.

Choosing the wrong representation can turn a clean algorithm into slow, memory-heavy code.

Solution

Match the representation to the operation you perform most often.

The common representations are:

Representation Best Use
Adjacency list Traversal and sparse graphs
Adjacency matrix Dense graphs and fast edge lookup
Edge list Storage, sorting, and edge-centric algorithms
Compressed sparse row Large static sparse graphs
Hash-based adjacency sets Fast neighbor lookup with sparse graphs

There is no universal best representation. Each one optimizes a different access pattern.

Discussion

A graph representation answers three questions:

How do I enumerate neighbors?
How do I test whether an edge exists?
How much memory does the graph require?

Most algorithms care deeply about at least one of these.

DFS and BFS need fast neighbor enumeration.

Floyd-Warshall needs fast matrix-style indexing.

Kruskal needs sorted edges.

Graph databases need storage and lookup.

Recommendation systems often need compressed sparse structures.

The right representation is the one that makes the dominant operation cheap.

Adjacency List

An adjacency list maps each vertex to its outgoing neighbors.

graph = {
    "A": ["B", "C"],
    "B": ["D"],
    "C": ["D"],
    "D": [],
}

For weighted graphs:

graph = {
    "A": [("B", 5), ("C", 2)],
    "B": [("D", 7)],
    "C": [("D", 1)],
    "D": [],
}

Adjacency lists are the default for most sparse graph algorithms.

Operation Complexity
Enumerate neighbors O(degree(v))
Add edge O(1) amortized
Check edge existence O(degree(v))
Memory usage O(V + E)

Use adjacency lists for DFS, BFS, topological sorting, connected components, shortest paths on sparse graphs, and most practical graph traversal tasks.

Adjacency Matrix

An adjacency matrix stores every possible pair of vertices.

matrix = [
    [False, True,  True,  False],
    [False, False, False, True ],
    [False, False, False, True ],
    [False, False, False, False],
]

For weighted graphs, store weights instead of booleans.

inf = float("inf")

matrix = [
    [0,   5,   2, inf],
    [inf, 0, inf,  7],
    [inf, inf, 0,  1],
    [inf, inf, inf, 0],
]
Operation Complexity
Enumerate neighbors O(V)
Add edge O(1)
Check edge existence O(1)
Memory usage O(V²)

Use matrices for dense graphs, all-pairs algorithms, graph complements, and workloads dominated by edge-existence checks.

Avoid them for huge sparse graphs.

Edge List

An edge list stores edges directly.

edges = [
    ("A", "B"),
    ("A", "C"),
    ("B", "D"),
    ("C", "D"),
]

For weighted graphs:

edges = [
    ("A", "B", 5),
    ("A", "C", 2),
    ("B", "D", 7),
    ("C", "D", 1),
]
Operation Complexity
Enumerate all edges O(E)
Add edge O(1) amortized
Find neighbors O(E)
Check edge existence O(E)
Memory usage O(E)

Use edge lists for input files, serialization, Kruskal's algorithm, Bellman-Ford, batch processing, and graph construction pipelines.

Convert to an adjacency list when traversal becomes important.

Hash-Based Adjacency Sets

If edge-existence checks are frequent but the graph is sparse, store each neighbor collection as a set.

graph = {
    "A": {"B", "C"},
    "B": {"D"},
    "C": {"D"},
    "D": set(),
}

Then:

"B" in graph["A"]

runs in expected constant time.

Operation Complexity
Enumerate neighbors O(degree(v))
Add edge O(1) expected
Check edge existence O(1) expected
Memory usage O(V + E), with higher constants

This representation is useful when you need sparse-graph traversal and frequent edge lookup.

The tradeoff is memory overhead. Sets use more memory than lists.

Compressed Sparse Row

For very large static sparse graphs, adjacency lists stored as Python objects can be too expensive.

Compressed Sparse Row (CSR) stores neighbors in flat arrays.

For example:

A: B C
B: D
C: D
D:

becomes:

offsets = [0, 2, 3, 4, 4]
neighbors = [1, 2, 3, 3]

Vertex v has neighbors in:

neighbors[offsets[v]:offsets[v + 1]]

For vertex 0:

neighbors[offsets[0]:offsets[1]]

returns:

[1, 2]

CSR is compact, cache-friendly, and widely used in graph-processing systems, sparse linear algebra, and machine learning workloads.

It is best for static graphs. Inserting edges dynamically is expensive.

Representation Selection Table

Workload Recommended Representation
DFS or BFS on sparse graph Adjacency list
Fast edge lookup on sparse graph Adjacency sets
Dense graph Adjacency matrix
All-pairs shortest paths Adjacency matrix
Kruskal's algorithm Edge list
Bellman-Ford Edge list or adjacency list
Dijkstra on sparse graph Weighted adjacency list
Static huge graph CSR
Graph input file Edge list
Dynamic edge insertions Adjacency list or sets
Frequent edge deletions Sets or specialized structure
Memory-constrained traversal CSR or compact arrays

Use this table as a starting point, not as an absolute rule.

Directed vs Undirected Storage

Directed edges are stored once.

graph[u].append(v)

Undirected edges are usually stored twice.

graph[u].append(v)
graph[v].append(u)

This affects both memory and algorithms.

An undirected graph with E logical edges stores:

2E adjacency entries

in an adjacency list.

When reporting complexity, E usually refers to logical graph edges. Implementation details may store twice as many neighbor entries.

Weighted vs Unweighted Storage

Unweighted adjacency list:

graph[u].append(v)

Weighted adjacency list:

graph[u].append((v, weight))

The representation should reflect what the algorithm needs.

Do not store weights when the problem is purely unweighted. Extra data increases memory use and cognitive load.

Do not discard weights if the problem asks for cost, distance, time, capacity, or risk. Doing so changes the problem.

Vertex Labels and Indexes

External data often uses strings:

"new_york"
"boston"
"chicago"

Algorithms often run faster with integer IDs:

0
1
2

A mapping layer solves this:

labels = ["new_york", "boston", "chicago"]

index = {
    label: i
    for i, label in enumerate(labels)
}

reverse = {
    i: label
    for label, i in index.items()
}

Internal graph:

graph = [
    [1, 2],
    [0],
    [0],
]

External output:

reverse[0]

This pattern improves memory locality and reduces overhead.

Conversion Recipes

Edge List to Adjacency List

def edge_list_to_adj(edges):
    graph = {}

    for u, v in edges:
        graph.setdefault(u, []).append(v)
        graph.setdefault(v, [])

    return graph

Adjacency List to Edge List

def adj_to_edge_list(graph):
    edges = []

    for u, neighbors in graph.items():
        for v in neighbors:
            edges.append((u, v))

    return edges

Adjacency List to Matrix

def adj_to_matrix(graph, index):
    n = len(index)
    matrix = [[False] * n for _ in range(n)]

    for u, neighbors in graph.items():
        for v in neighbors:
            matrix[index[u]][index[v]] = True

    return matrix

These conversions are often worthwhile when the algorithmic phase changes.

Load as an edge list. Traverse as an adjacency list. Analyze all pairs as a matrix.

Memory Considerations

The asymptotic notation hides important constants.

A Python adjacency list using dictionaries and lists is easy to write, but it has substantial object overhead.

For small and medium graphs, this is fine.

For very large graphs, compact arrays matter.

A rough hierarchy from most convenient to most compact:

Representation Convenience Memory Efficiency
Dict of lists High Medium
Dict of sets High Low
List of lists Medium High
Edge list arrays Medium High
CSR arrays Low Very high
Bit matrix Medium High for dense graphs

Representation choice becomes a systems problem at scale.

Common Pitfalls

Do not default to an adjacency matrix for sparse graphs.

Do not use an edge list for repeated neighbor queries.

Do not use lists when edge-existence checks dominate.

Do not forget isolated vertices during conversion.

Do not lose weights when converting representations.

Do not accidentally double-count undirected edges.

Do not expose integer IDs to users when labels are expected.

Do not optimize representation before understanding the dominant workload.

Takeaway

Graph representation determines the cost of every graph operation. Adjacency lists are the default for sparse traversal, adjacency matrices are best for dense graphs and constant-time edge lookup, edge lists are ideal for storage and edge-oriented algorithms, adjacency sets improve lookup at higher memory cost, and CSR serves large static sparse graphs. Good graph engineering begins by matching the representation to the workload rather than forcing every algorithm into a single structure.