9.1 Graph Models
You need to turn a real problem into a graph before choosing an algorithm.
9.1 Graph Models
Problem
You need to turn a real problem into a graph before choosing an algorithm. The hard part is usually not writing DFS or BFS. The hard part is deciding what the vertices mean, what the edges mean, whether direction matters, whether weights matter, and what special cases the model must preserve.
A poor graph model can make a simple problem look difficult. A good model usually makes the correct algorithm almost obvious.
Solution
Start by naming three things:
- What is a vertex?
- What is an edge?
- What question must the graph answer?
For example, suppose you are building a task planner. A task can depend on another task. Task B cannot start until task A finishes.
A natural graph model is:
| Concept | Graph meaning |
|---|---|
| Task | Vertex |
Dependency A before B |
Directed edge A -> B |
| Valid execution order | Topological ordering |
| Circular dependency | Directed cycle |
In code, you might represent this graph with an adjacency list:
tasks = ["parse", "typecheck", "compile", "test"]
graph = {
"parse": ["typecheck"],
"typecheck": ["compile"],
"compile": ["test"],
"test": [],
}
The edge "parse" -> "typecheck" means that typecheck depends on parse. Once the graph is modeled this way, the required algorithm is no longer mysterious. You need topological sorting, and you need cycle detection.
For a different problem, such as finding whether two users are connected in a social network, the model changes:
| Concept | Graph meaning |
|---|---|
| User | Vertex |
| Friendship | Undirected edge |
| Connected users | Reachability |
| Friend groups | Connected components |
graph = {
"alice": ["bob", "diana"],
"bob": ["alice", "carol"],
"carol": ["bob"],
"diana": ["alice"],
}
Here, the edge between alice and bob works in both directions. If the input gives a friendship pair (alice, bob), the graph should store both alice -> bob and bob -> alice.
Discussion
A graph is a modeling tool. It does not require the data to look like a network at first. Many problems become graph problems only after you choose the right interpretation.
A spreadsheet of flights becomes a directed weighted graph:
| Data field | Graph meaning |
|---|---|
| Airport | Vertex |
| Flight route | Directed edge |
| Ticket price | Edge weight |
| Cheapest itinerary | Shortest path |
A source code project becomes a directed graph:
| Data field | Graph meaning |
|---|---|
| File or module | Vertex |
| Import relation | Directed edge |
| Build order | Topological ordering |
| Circular import | Cycle |
A grid maze becomes a graph:
| Data field | Graph meaning |
|---|---|
| Open cell | Vertex |
| Legal move between cells | Edge |
| Wall | Missing vertex or missing edge |
| Shortest route | BFS shortest path |
The modeling decision controls the algorithm. If edges are unweighted, BFS may be enough for shortest paths. If edges have nonnegative weights, Dijkstra's algorithm is usually the right starting point. If edges can have negative weights, Bellman-Ford becomes relevant. If the graph describes dependencies, topological sort becomes central. If it describes mutual reachability, connected components or strongly connected components matter.
The most common modeling mistake is to put the wrong thing in the vertex set. For example, in a route-planning problem, vertices are usually locations, and edges are trips. But in some scheduling problems, the vertex may need to be a state such as (location, time) or (location, fuel_remaining). Adding state to vertices can convert an apparently impossible problem into an ordinary graph search.
Consider a robot moving on a grid with limited battery. If each vertex is only a cell, the model loses battery information. The search may incorrectly treat two arrivals at the same cell as equivalent, even though arriving with 10 battery units is different from arriving with 1 battery unit.
A better model is:
vertex = (row, column, battery)
edge = one legal move that consumes battery
This larger graph is more expensive, but it preserves the information needed for correctness.
Directed and Undirected Graphs
Use a directed graph when the relationship has orientation.
A -> B
This can mean "A must happen before B", "A links to B", "A can reach B", or "A sends data to B".
Use an undirected graph when the relationship is symmetric.
A -- B
This can mean "A is connected to B", "A is friends with B", or "A can exchange directly with B".
In implementation, an undirected edge is usually stored as two directed adjacency entries:
def add_undirected_edge(graph, a, b):
graph.setdefault(a, []).append(b)
graph.setdefault(b, []).append(a)
For directed edges, store only the actual direction:
def add_directed_edge(graph, a, b):
graph.setdefault(a, []).append(b)
graph.setdefault(b, [])
The second line ensures that b still appears in the graph even if it has no outgoing edges.
Weighted and Unweighted Graphs
Use an unweighted graph when every edge has the same cost.
graph = {
"A": ["B", "C"],
"B": ["D"],
"C": ["D"],
"D": [],
}
Use a weighted graph when edges carry costs, distances, capacities, probabilities, or penalties.
graph = {
"A": [("B", 5), ("C", 2)],
"B": [("D", 4)],
"C": [("D", 7)],
"D": [],
}
The weight must represent the quantity your algorithm optimizes. If you are minimizing travel time, the weight should be time, not physical distance. If you are minimizing cost, the weight should be price. If multiple quantities matter, such as time and price, you may need a multi-objective model or a state-expanded graph.
Sparse and Dense Graphs
A sparse graph has relatively few edges compared with the number of possible edges. Road networks, dependency graphs, and social graphs are usually sparse.
A dense graph has many possible connections. Similarity graphs, complete graphs, and some optimization models can be dense.
This distinction affects representation. Sparse graphs usually fit adjacency lists well. Dense graphs may justify adjacency matrices.
| Representation | Best for | Memory |
|---|---|---|
| Adjacency list | Sparse graphs, traversal | O(V + E) |
| Adjacency matrix | Dense graphs, constant-time edge lookup | O(V²) |
| Edge list | Sorting edges, Kruskal-style algorithms | O(E) |
Use an adjacency list as the default. Switch only when the algorithm needs something else.
Recipe: Build a Graph from Edge Data
Many inputs arrive as a list of edges. Convert that list into an adjacency list before running traversal algorithms.
def build_directed_graph(vertices, edges):
graph = {v: [] for v in vertices}
for source, target in edges:
graph[source].append(target)
return graph
Example:
vertices = ["parse", "typecheck", "compile", "test"]
edges = [
("parse", "typecheck"),
("typecheck", "compile"),
("compile", "test"),
]
graph = build_directed_graph(vertices, edges)
Result:
{
"parse": ["typecheck"],
"typecheck": ["compile"],
"compile": ["test"],
"test": [],
}
For undirected graphs:
def build_undirected_graph(vertices, edges):
graph = {v: [] for v in vertices}
for a, b in edges:
graph[a].append(b)
graph[b].append(a)
return graph
This version intentionally stores both directions.
Recipe: Build a Grid Graph Lazily
For grid problems, you often do not need to build the whole graph. Generate neighbors when the search asks for them.
def neighbors(grid, row, col):
height = len(grid)
width = len(grid[0])
for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
nr = row + dr
nc = col + dc
if 0 <= nr < height and 0 <= nc < width:
if grid[nr][nc] != "#":
yield nr, nc
This models each open cell as a vertex and each legal move as an edge. The graph exists conceptually, but the program does not allocate an adjacency list for every cell.
This approach is usually cleaner for mazes, maps, board states, and search spaces generated by rules.
Correctness Notes
A graph model is correct when it preserves the relationships needed by the target question.
For reachability, every valid move in the original problem must correspond to an edge, and every edge must correspond to a valid move.
For shortest paths, every path cost in the graph must match the true cost in the original problem.
For dependency ordering, every required precedence relation must appear as a directed edge.
For component analysis, the graph must include all vertices, including isolated ones. Omitting isolated vertices is a common bug because the algorithm may return too few components.
Common Pitfalls
Do not treat a directed relationship as undirected. If A imports B, that does not imply B imports A.
Do not drop isolated vertices. A user with no friends, a task with no dependencies, or a city with no roads may still be part of the input and may still affect the answer.
Do not use physical distance as the edge weight when the problem asks for time, price, risk, or energy.
Do not collapse state too aggressively. If two visits to the same location differ in fuel, time, permissions, keys, or inventory, they may need to be different vertices.
Do not choose an adjacency matrix by default. For large sparse graphs, it wastes memory and often makes traversal slower in practice.
Takeaway
A graph model defines the algorithmic shape of the problem. Choose vertices to preserve the state that matters, choose edges to preserve the allowed relationships or transitions, and choose weights to match the quantity being optimized. Once the model is right, most graph algorithms become direct applications of traversal, ordering, connectivity, or shortest-path machinery.