9.1 Graph Models

Problem

You need to turn a real problem into a graph before choosing an algorithm. The hard part is usually not writing DFS or BFS. The hard part is deciding what the vertices mean, what the edges mean, whether direction matters, whether weights matter, and what special cases the model must preserve.

A poor graph model can make a simple problem look difficult. A good model usually makes the correct algorithm almost obvious.

Solution

Start by naming three things:

What is a vertex?
What is an edge?
What question must the graph answer?

For example, suppose you are building a task planner. A task can depend on another task. Task B cannot start until task A finishes.

A natural graph model is:

Concept	Graph meaning
Task	Vertex
Dependency `A before B`	Directed edge `A -> B`
Valid execution order	Topological ordering
Circular dependency	Directed cycle

In code, you might represent this graph with an adjacency list:

tasks = ["parse", "typecheck", "compile", "test"]

graph = {
    "parse": ["typecheck"],
    "typecheck": ["compile"],
    "compile": ["test"],
    "test": [],
}

The edge "parse" -> "typecheck" means that typecheck depends on parse. Once the graph is modeled this way, the required algorithm is no longer mysterious. You need topological sorting, and you need cycle detection.

For a different problem, such as finding whether two users are connected in a social network, the model changes:

Concept	Graph meaning
User	Vertex
Friendship	Undirected edge
Connected users	Reachability
Friend groups	Connected components

graph = {
    "alice": ["bob", "diana"],
    "bob": ["alice", "carol"],
    "carol": ["bob"],
    "diana": ["alice"],
}

Here, the edge between alice and bob works in both directions. If the input gives a friendship pair (alice, bob), the graph should store both alice -> bob and bob -> alice.

Discussion

A graph is a modeling tool. It does not require the data to look like a network at first. Many problems become graph problems only after you choose the right interpretation.

A spreadsheet of flights becomes a directed weighted graph:

Data field	Graph meaning
Airport	Vertex
Flight route	Directed edge
Ticket price	Edge weight
Cheapest itinerary	Shortest path

A source code project becomes a directed graph:

Data field	Graph meaning
File or module	Vertex
Import relation	Directed edge
Build order	Topological ordering
Circular import	Cycle

A grid maze becomes a graph:

Data field	Graph meaning
Open cell	Vertex
Legal move between cells	Edge
Wall	Missing vertex or missing edge
Shortest route	BFS shortest path

The modeling decision controls the algorithm. If edges are unweighted, BFS may be enough for shortest paths. If edges have nonnegative weights, Dijkstra's algorithm is usually the right starting point. If edges can have negative weights, Bellman-Ford becomes relevant. If the graph describes dependencies, topological sort becomes central. If it describes mutual reachability, connected components or strongly connected components matter.

The most common modeling mistake is to put the wrong thing in the vertex set. For example, in a route-planning problem, vertices are usually locations, and edges are trips. But in some scheduling problems, the vertex may need to be a state such as (location, time) or (location, fuel_remaining). Adding state to vertices can convert an apparently impossible problem into an ordinary graph search.

Consider a robot moving on a grid with limited battery. If each vertex is only a cell, the model loses battery information. The search may incorrectly treat two arrivals at the same cell as equivalent, even though arriving with 10 battery units is different from arriving with 1 battery unit.

A better model is:

vertex = (row, column, battery)
edge   = one legal move that consumes battery

This larger graph is more expensive, but it preserves the information needed for correctness.

Directed and Undirected Graphs

Use a directed graph when the relationship has orientation.

A -> B

This can mean "A must happen before B", "A links to B", "A can reach B", or "A sends data to B".

Use an undirected graph when the relationship is symmetric.

A -- B

This can mean "A is connected to B", "A is friends with B", or "A can exchange directly with B".

In implementation, an undirected edge is usually stored as two directed adjacency entries:

def add_undirected_edge(graph, a, b):
    graph.setdefault(a, []).append(b)
    graph.setdefault(b, []).append(a)

For directed edges, store only the actual direction:

def add_directed_edge(graph, a, b):
    graph.setdefault(a, []).append(b)
    graph.setdefault(b, [])

The second line ensures that b still appears in the graph even if it has no outgoing edges.

Weighted and Unweighted Graphs

Use an unweighted graph when every edge has the same cost.

graph = {
    "A": ["B", "C"],
    "B": ["D"],
    "C": ["D"],
    "D": [],
}

Use a weighted graph when edges carry costs, distances, capacities, probabilities, or penalties.

graph = {
    "A": [("B", 5), ("C", 2)],
    "B": [("D", 4)],
    "C": [("D", 7)],
    "D": [],
}

The weight must represent the quantity your algorithm optimizes. If you are minimizing travel time, the weight should be time, not physical distance. If you are minimizing cost, the weight should be price. If multiple quantities matter, such as time and price, you may need a multi-objective model or a state-expanded graph.

Sparse and Dense Graphs

A sparse graph has relatively few edges compared with the number of possible edges. Road networks, dependency graphs, and social graphs are usually sparse.

A dense graph has many possible connections. Similarity graphs, complete graphs, and some optimization models can be dense.

This distinction affects representation. Sparse graphs usually fit adjacency lists well. Dense graphs may justify adjacency matrices.

Representation	Best for	Memory
Adjacency list	Sparse graphs, traversal	`O(V + E)`
Adjacency matrix	Dense graphs, constant-time edge lookup	`O(V²)`
Edge list	Sorting edges, Kruskal-style algorithms	`O(E)`

Use an adjacency list as the default. Switch only when the algorithm needs something else.

Recipe: Build a Graph from Edge Data

Many inputs arrive as a list of edges. Convert that list into an adjacency list before running traversal algorithms.

def build_directed_graph(vertices, edges):
    graph = {v: [] for v in vertices}

    for source, target in edges:
        graph[source].append(target)

    return graph

Example:

vertices = ["parse", "typecheck", "compile", "test"]
edges = [
    ("parse", "typecheck"),
    ("typecheck", "compile"),
    ("compile", "test"),
]

graph = build_directed_graph(vertices, edges)

Result:

{
    "parse": ["typecheck"],
    "typecheck": ["compile"],
    "compile": ["test"],
    "test": [],
}

For undirected graphs:

def build_undirected_graph(vertices, edges):
    graph = {v: [] for v in vertices}

    for a, b in edges:
        graph[a].append(b)
        graph[b].append(a)

    return graph

This version intentionally stores both directions.

Recipe: Build a Grid Graph Lazily

For grid problems, you often do not need to build the whole graph. Generate neighbors when the search asks for them.

def neighbors(grid, row, col):
    height = len(grid)
    width = len(grid[0])

    for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
        nr = row + dr
        nc = col + dc

        if 0 <= nr < height and 0 <= nc < width:
            if grid[nr][nc] != "#":
                yield nr, nc

This models each open cell as a vertex and each legal move as an edge. The graph exists conceptually, but the program does not allocate an adjacency list for every cell.

This approach is usually cleaner for mazes, maps, board states, and search spaces generated by rules.

Correctness Notes

A graph model is correct when it preserves the relationships needed by the target question.

For reachability, every valid move in the original problem must correspond to an edge, and every edge must correspond to a valid move.

For shortest paths, every path cost in the graph must match the true cost in the original problem.

For dependency ordering, every required precedence relation must appear as a directed edge.

For component analysis, the graph must include all vertices, including isolated ones. Omitting isolated vertices is a common bug because the algorithm may return too few components.

Common Pitfalls

Do not treat a directed relationship as undirected. If A imports B, that does not imply B imports A.

Do not drop isolated vertices. A user with no friends, a task with no dependencies, or a city with no roads may still be part of the input and may still affect the answer.

Do not use physical distance as the edge weight when the problem asks for time, price, risk, or energy.

Do not collapse state too aggressively. If two visits to the same location differ in fuel, time, permissions, keys, or inventory, they may need to be different vertices.

Do not choose an adjacency matrix by default. For large sparse graphs, it wastes memory and often makes traversal slower in practice.

Takeaway

A graph model defines the algorithmic shape of the problem. Choose vertices to preserve the state that matters, choose edges to preserve the allowed relationships or transitions, and choose weights to match the quantity being optimized. Once the model is right, most graph algorithms become direct applications of traversal, ordering, connectivity, or shortest-path machinery.