9.11 Connected Components

You need to identify independent regions within a graph.

9.11 Connected Components

Problem

You need to identify independent regions within a graph.

Many graphs are not fully connected. A social network may contain separate communities. A transportation network may contain isolated islands. A computer network may contain disconnected segments. A dataset may contain several unrelated clusters.

Before solving higher-level problems, it is often necessary to determine which vertices belong together.

You need a way to partition the graph into maximal connected groups.

Solution

Compute the connected components.

A connected component is a maximal set of vertices such that every pair of vertices is connected by a path.

Consider the graph:

A -- B -- C

D -- E

F

The graph contains three connected components:

{A, B, C}
{D, E}
{F}

The standard approach repeatedly launches DFS or BFS from unvisited vertices.

def connected_components(graph):
    visited = set()
    components = []

    def dfs(vertex, component):
        visited.add(vertex)
        component.append(vertex)

        for neighbor in graph[vertex]:
            if neighbor not in visited:
                dfs(neighbor, component)

    for vertex in graph:
        if vertex in visited:
            continue

        component = []

        dfs(vertex, component)

        components.append(component)

    return components

Every DFS discovers exactly one connected component.

Discussion

Connected components are one of the most important structural concepts in graph theory.

Many graph algorithms implicitly assume that the graph is connected. When that assumption fails, the algorithm must be executed separately on each component.

For example:

  • Minimum spanning trees require connectivity.
  • Diameter computations are usually defined per component.
  • Many routing algorithms assume paths exist.
  • Network analysis often begins by identifying disconnected regions.

Connected-component analysis answers a simple but fundamental question:

Which vertices belong to the same network?

Understanding Connectivity

Consider:

A -- B -- C -- D

Starting DFS from A eventually visits:

A
B
C
D

All vertices belong to the same component.

Now consider:

A -- B -- C

D -- E

Starting DFS from A reaches:

A
B
C

but never:

D
E

The traversal has discovered only part of the graph.

Launching a second DFS from an unvisited vertex discovers the next component.

This process continues until every vertex has been assigned to exactly one component.

Why Components Are Maximal

A connected component is not merely a connected set.

It must be maximal.

Consider:

A -- B -- C

The set:

{A, B}

is connected.

However, it is not a connected component because:

C

can be added without breaking connectivity.

The actual component is:

{A, B, C}

A component cannot be expanded further.

This maximality property ensures that components partition the graph cleanly.

Every vertex belongs to exactly one connected component.

Recipe: Count Components

Sometimes only the number of components matters.

def count_components(graph):
    visited = set()

    def dfs(vertex):
        visited.add(vertex)

        for neighbor in graph[vertex]:
            if neighbor not in visited:
                dfs(neighbor)

    count = 0

    for vertex in graph:
        if vertex not in visited:
            dfs(vertex)
            count += 1

    return count

Example:

graph = {
    "A": ["B"],
    "B": ["A"],
    "C": ["D"],
    "D": ["C"],
    "E": [],
}

Result:

3

This pattern appears frequently in interview problems and graph analytics.

Recipe: Label Components

Sometimes component membership must be retained.

Assign an identifier to every vertex.

def component_labels(graph):
    visited = set()
    labels = {}

    def dfs(vertex, component_id):
        visited.add(vertex)

        labels[vertex] = component_id

        for neighbor in graph[vertex]:
            if neighbor not in visited:
                dfs(neighbor, component_id)

    component_id = 0

    for vertex in graph:
        if vertex not in visited:
            dfs(vertex, component_id)
            component_id += 1

    return labels

Example:

{
    "A": 0,
    "B": 0,
    "C": 1,
    "D": 1,
    "E": 2,
}

Vertices with the same label belong to the same component.

This representation is often convenient for downstream processing.

Isolated Vertices

An isolated vertex forms its own component.

Consider:

A -- B

C

The graph contains:

{A, B}
{C}

Even though C has no edges, it remains a valid connected component.

This observation produces a common implementation bug.

Suppose the graph is stored as:

graph = {
    "A": ["B"],
    "B": ["A"],
}

Vertex C has disappeared entirely.

The component count becomes incorrect.

Always ensure isolated vertices are represented explicitly.

graph = {
    "A": ["B"],
    "B": ["A"],
    "C": [],
}

Now the graph structure is complete.

BFS Version

Connected components can also be computed using BFS.

from collections import deque

def connected_components_bfs(graph):
    visited = set()
    result = []

    for start in graph:
        if start in visited:
            continue

        component = []

        queue = deque([start])
        visited.add(start)

        while queue:
            vertex = queue.popleft()

            component.append(vertex)

            for neighbor in graph[vertex]:
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)

        result.append(component)

    return result

The complexity remains identical.

The choice between DFS and BFS is largely stylistic for this problem.

Union-Find Approach

When edges arrive incrementally, DFS may not be ideal.

Consider a stream:

(A, B)
(B, C)
(D, E)
(C, D)

Recomputing components after every edge is expensive.

A better approach is Union-Find.

Each edge merges two sets.

for u, v in edges:
    union(u, v)

After processing all edges, vertices in the same set belong to the same connected component.

This approach is discussed in detail in Chapter 11.

For static graphs, DFS and BFS remain simpler.

For dynamic connectivity, Union-Find is often the preferred solution.

Giant Components

Large real-world networks often exhibit a giant component.

Consider a social network:

Millions of users

Most users belong to one enormous connected region.

Smaller components consist of:

  • Isolated users
  • Small communities
  • Test accounts
  • Disconnected data

A common graph-analysis task is identifying the largest component.

components = connected_components(graph)

largest = max(
    components,
    key=len
)

This simple operation frequently appears in network science and graph mining.

Component Size Distribution

Beyond counting components, size distribution often reveals useful information.

Example:

Component 1: 5000 vertices
Component 2: 20 vertices
Component 3: 7 vertices
Component 4: 1 vertex

This distribution suggests:

  • One dominant network
  • Several small isolated groups

Computing sizes is straightforward.

sizes = [
    len(component)
    for component in components
]

The resulting statistics can guide further analysis.

Connectivity Verification

A graph is connected if it contains exactly one component.

def is_connected(graph):
    return (
        count_components(graph)
        == 1
    )

Example:

A -- B -- C

Result:

True

Example:

A -- B

C

Result:

False

This test is often performed before algorithms that assume connectivity.

Complexity Analysis

Let:

  • V = number of vertices
  • E = number of edges

Each vertex is visited exactly once.

Each edge is examined exactly twice in an undirected graph.

Total complexity:

Operation Complexity
Component discovery O(V + E)
Component counting O(V + E)
Component labeling O(V + E)
Largest component O(V + E)
Connectivity test O(V + E)

The algorithm is linear in graph size.

This efficiency makes connected-component analysis a common preprocessing step.

Common Pitfalls

Do not stop after one DFS.

A single DFS discovers only one component.

Do not omit isolated vertices from the graph representation.

Do not assume a graph is connected because it has many edges.

Multiple dense regions may still be disconnected.

Do not confuse weak connectivity and connected components.

Connected components apply to undirected graphs.

Directed graphs require different notions of connectivity.

Do not revisit already explored vertices.

The visited set is essential for both correctness and performance.

Recipe: Largest Connected Component

A frequent practical task is extracting the largest component.

def largest_component(graph):
    components = connected_components(graph)

    return max(
        components,
        key=len
    )

Example:

[
    ["A", "B", "C"],
    ["D", "E"],
    ["F"]
]

Result:

["A", "B", "C"]

Many graph-analysis pipelines restrict later processing to the largest component.

This avoids distortion caused by tiny disconnected regions.

Takeaway

Connected components partition an undirected graph into maximal connected regions. They reveal the large-scale structure of a network, identify isolated groups, validate connectivity assumptions, and provide a foundation for many higher-level graph algorithms. The algorithm itself is remarkably simple: repeatedly launch DFS or BFS from unvisited vertices. Despite that simplicity, connected-component analysis is one of the most useful structural tools in graph theory and graph engineering.