Chapter 51. Network Reliability

Network reliability studies whether a network continues to function when some of its parts fail.

In graph theory, a network is modeled as a graph. Vertices may represent routers, stations, cities, servers, processors, or people. Edges may represent links, roads, cables, communication channels, dependencies, or relationships. Reliability asks whether the remaining graph still has the required connectivity after some vertices or edges are removed.

This chapter connects the deterministic language of connectivity with probabilistic models of failure.

51.1 Deterministic and Probabilistic Reliability

The previous chapters treated connectivity deterministically.

For example, if

\lambda(G)=3,

then at least three edges must be removed before the graph can become disconnected.

This gives a worst-case guarantee. It assumes that failures may occur in the most damaging possible locations.

Network reliability asks a probabilistic question. Instead of assuming a chosen set of failures, we assign probabilities to failures and ask how likely the network is to remain operational. In standard graph reliability models, vertices or edges have probabilities of being operational, and the graph is considered functional if a specified connectivity condition remains true.

Thus there are two complementary viewpoints:

Viewpoint	Question
Connectivity	How many failures can the graph always tolerate?
Reliability	What is the probability that the graph survives random failures?

Both are useful. Connectivity gives structural guarantees. Reliability gives quantitative risk.

51.2 Edge Failure Model

The simplest reliability model assumes that vertices never fail and edges fail independently.

Let

G=(V,E)

be a connected graph. Each edge is operational with probability $p$ and failed with probability

1-p.

The operational subgraph is obtained by keeping each edge independently with probability $p$ .

The all-terminal reliability of $G$ is the probability that the operational subgraph remains connected.

It is often written as

R_G(p).

Thus

R_G(p)=\Pr(G_p \text{ is connected}),

where $G_p$ denotes the random subgraph produced by retaining each edge with probability $p$ .

This definition turns connectivity into a probability.

51.3 State Enumeration

A state of the network is a choice of which edges are operational.

If $G$ has $m$ edges, then there are

2^m

possible edge states.

For a state $A\subseteq E$ , the edges in $A$ are operational, and the edges in $E\setminus A$ have failed.

If all edges work independently with the same probability $p$ , then the probability of state $A$ is

p^{|A|}(1-p)^{m-|A|}.

The network is connected in state $A$ precisely when the spanning subgraph

(V,A)

is connected.

Therefore

R_G(p)= \sum_{\substack{A\subseteq E\\(V,A)\text{ connected}}} p^{|A|}(1-p)^{m-|A|}.

This formula is exact, but it is usually not computationally efficient. The number of states doubles with every additional edge.

51.4 Reliability Polynomial

The expression $R_G(p)$ is called the reliability polynomial of the graph.

It records the probability that the graph remains operational when edges fail independently with a fixed probability. The reliability polynomial is widely used as a graph invariant in network reliability.

For a graph with $n$ vertices and $m$ edges, a connected operational subgraph must contain at least

n-1

edges. Hence the reliability polynomial has the form

R_G(p)= \sum_{i=n-1}^{m} c_i p^i(1-p)^{m-i},

where $c_i$ is the number of connected spanning subgraphs with exactly $i$ edges.

The coefficients $c_i$ have a direct meaning:

Coefficient	Meaning
$c_{n-1}$	Number of spanning trees
$c_i$	Number of connected spanning subgraphs with $i$ edges
$c_m$	Equal to $1$ , the full graph

This gives a bridge between reliability, spanning trees, and graph enumeration.

51.5 Example: A Path

Let $P_n$ be a path with $n$ vertices.

It has

n-1

edges. If any one edge fails, the path becomes disconnected. Therefore all edges must be operational.

Thus

R_{P_n}(p)=p^{n-1}.

A path is minimally connected. It has no redundant edges. Its reliability decreases quickly as the number of edges grows.

For example, if $n=5$ , then

R_{P_5}(p)=p^4.

At $p=0.9$ , this gives

0.9^4=0.6561.

Even though each edge is reliable individually, the whole path is less reliable because every edge is essential.

51.6 Example: A Cycle

Let $C_n$ be a cycle with $n$ vertices.

The cycle remains connected if at least

n-1

of its $n$ edges are operational. It becomes disconnected only if at least two edges fail.

Therefore

R_{C_n}(p)=p^n+n p^{n-1}(1-p).

The first term corresponds to the state in which all edges work. The second term corresponds to the $n$ states in which exactly one edge fails.

This may be simplified:

R_{C_n}(p)=p^{n-1}\bigl(n-(n-1)p\bigr).

Cycles are more reliable than paths with the same number of vertices because they contain one redundant route.

51.7 Two-Terminal Reliability

All-terminal reliability requires the whole graph to remain connected.

Sometimes only two specified vertices matter.

Let $s,t\in V$ . The two-terminal reliability between $s$ and $t$ is the probability that $s$ and $t$ remain connected in the operational subgraph.

It is written as

R_{s,t}(p).

Thus

R_{s,t}(p)=\Pr(s \text{ is connected to } t \text{ in } G_p).

This model is useful when the network must preserve communication between a source and a destination, but other vertices may be disconnected without causing failure.

There are also intermediate models, such as $K$ -terminal reliability, where a specified subset $K\subseteq V$ must remain mutually connected.

51.8 Vertex Reliability

In the vertex failure model, edges remain available, but vertices fail independently.

Each vertex is operational with probability $p$ . The operational subgraph is induced by the operational vertices.

One may then ask whether the remaining graph is connected, or whether specified terminals remain connected.

Vertex reliability is useful when vertices represent machines, routers, substations, people, processors, or facilities. Edge reliability is useful when edges represent physical or logical links.

Many real systems require both models. A communication network may fail because routers fail, links fail, or both.

51.9 Mixed Reliability

In mixed reliability, both vertices and edges may fail.

Each vertex $v$ has an operational probability $p_v$ , and each edge $e$ has an operational probability $p_e$ . The operational network contains only working vertices and working edges whose endpoints are also working.

The reliability question then depends on the chosen success condition.

Typical success conditions include:

Reliability type	Success condition
All-terminal	All operational or required vertices remain connected
Two-terminal	A specified pair $s,t$ remains connected
$K$ -terminal	All vertices in a terminal set $K$ remain connected
$k$ -connected reliability	The operational graph remains $k$ -connected
Flow reliability	Enough capacity remains between terminals

Mixed models are closer to engineering systems but harder to compute.

51.10 Connectivity as a Lower Bound

Connectivity gives a deterministic lower bound on failure tolerance.

If a graph is $k$ -edge-connected, then deleting fewer than $k$ edges cannot disconnect it.

Therefore, in an independent edge failure model, the graph can fail only if at least $k$ edges fail.

This does not determine the exact reliability, because not every set of $k$ failed edges disconnects the graph. But it gives a useful first estimate.

Similarly, if a graph is $k$ -vertex-connected, then at least $k$ vertices must fail before disconnection is possible.

Thus higher connectivity generally improves reliability, but reliability also depends on the number and arrangement of minimum cuts.

51.11 Minimum Cuts and Failure Probability

Suppose $G$ has edge connectivity

\lambda(G)=k.

Then the most likely way for the graph to fail, when $p$ is close to $1$ , is often the failure of all edges in a minimum cut of size $k$ .

If there are many minimum cuts, the graph may be less reliable than another graph with the same value of $\lambda(G)$ but fewer minimum cuts.

Thus reliability depends on more than the minimum cut size.

It depends on:

Factor	Effect
Edge connectivity	Minimum number of edge failures needed
Number of minimum cuts	Number of most vulnerable failure modes
Larger cut structure	Failure probability beyond leading terms
Edge probabilities	Nonuniform risk across the network
Network topology	Redundancy and bottlenecks

A graph with the same connectivity can have different reliability because its weak cuts may be arranged differently.

51.12 Reliability and Menger’s Theorem

Menger’s theorem gives a path interpretation of reliability.

If there are $k$ edge-disjoint paths between two vertices $s$ and $t$ , then no set of fewer than $k$ edge failures can separate them.

Thus edge-disjoint paths provide route redundancy.

Similarly, internally vertex-disjoint paths provide protection against vertex failures.

In network design, this gives a concrete rule:

To improve reliability between two terminals, provide many independent routes between them.

However, independence in graph theory and independence in probability are different ideas. Edge-disjoint paths do not share edges, but their failures may still be statistically dependent in a real system. For example, two cables may be edge-disjoint in a logical graph but run through the same physical conduit.

Graph reliability models are useful only when the graph abstraction matches the failure mechanisms.

51.13 Series and Parallel Structures

Simple reliability calculations often reduce to series and parallel structures.

In a series connection, all components are required. If any component fails, the system fails.

For independent components with operational probabilities

p_1,p_2,\ldots,p_k,

the series reliability is

p_1p_2\cdots p_k.

A path is a series system.

In a parallel connection, at least one component must work. If components fail independently, the parallel reliability is

1-(1-p_1)(1-p_2)\cdots(1-p_k).

Parallel edges or independent routes create redundancy.

Many reliability computations decompose a graph into combinations of series and parallel parts. General graphs, however, rarely decompose completely in this way.

51.14 Complexity

Exact network reliability is computationally difficult.

The number of possible edge states is exponential in the number of edges. More formally, computing the probability that a random subgraph is connected is called the network reliability problem, and it is #P-hard. The related two-terminal problem asks whether two specified vertices remain connected and is also a standard reliability problem.

This hardness explains why practical methods often use:

Method	Purpose
Deletion-contraction	Exact recursive computation
Factoring by cut vertices or bridges	Reduce graph size
Monte Carlo simulation	Estimate reliability
Bounds	Give certified upper and lower estimates
Minimal cut enumeration	Approximate high-reliability behavior
Binary decision diagrams	Compactly represent state spaces

Exact formulas are feasible for small or structured graphs. Large networks usually require approximation or special structure.

51.15 Deletion-Contraction

Deletion-contraction is a basic recurrence for reliability polynomials.

Choose an edge $e$ . There are two cases:

$e$ fails.
$e$ works.

If $e$ fails, the graph becomes

G-e.

If $e$ works, its endpoints may be contracted for connectivity purposes, giving

G/e.

When every edge has operational probability $p$ , the recurrence has the form

R_G(p) = (1-p)R_{G-e}(p)+pR_{G/e}(p),

with suitable base cases.

This recurrence is exact. Its difficulty is that repeated deletion and contraction create many subproblems. Without memoization or structure, the computation grows exponentially.

51.16 Monte Carlo Estimation

Monte Carlo simulation estimates reliability by sampling many random network states.

The method is simple:

Generate a random operational subgraph.
Test whether the required connectivity condition holds.
Repeat many times.
Estimate reliability by the fraction of successful trials.

If $N$ trials are run and $X$ are successful, then the estimator is

\widehat{R}=\frac{X}{N}.

This estimator is unbiased under independent sampling.

Monte Carlo methods are flexible. They can handle large graphs, nonuniform probabilities, vertex failures, edge failures, and mixed models. Their limitation is statistical error, especially when failure events are rare.

For highly reliable systems, naive simulation may need many samples to observe enough failures. Rare-event simulation and importance sampling are used to improve efficiency.

51.17 Design Principles

Graph theory suggests several design principles for reliable networks.

First, avoid bridges. A bridge is a single edge failure that disconnects the graph.

Second, avoid articulation points. An articulation point is a single vertex failure that disconnects the graph.

Third, increase edge-disjoint and vertex-disjoint paths between important terminals.

Fourth, avoid small nontrivial cuts between large regions.

Fifth, consider the number of minimum cuts, not only their size.

Sixth, match the graph model to real failure modes. Two logical links may fail together if they share physical infrastructure.

These principles are structural. They do not replace engineering constraints such as cost, geography, latency, bandwidth, power, and maintenance.

51.18 Example: Comparing Two Designs

Consider two networks on four vertices.

The first is a path:

1-2-3-4.

Its reliability is

R(p)=p^3.

The second is a cycle:

1-2-3-4-1.

Its reliability is

R(p)=p^4+4p^3(1-p).

At $p=0.9$ , the path has reliability

0.9^3=0.729.

The cycle has reliability

0.9^4+4(0.9)^3(0.1)=0.9477.

Adding one edge greatly improves reliability because the network can survive any single edge failure.

This example illustrates the value of redundancy. A small increase in edge count can produce a large increase in survival probability.

51.19 Limitations of Graph Reliability

Graph reliability models are abstractions.

They usually assume independent failures, fixed probabilities, and a clear success condition. Real systems may violate all three assumptions.

Failures may be correlated. A storm, power outage, software bug, or shared conduit may disable many components together.

Probabilities may change over time. Components age, loads vary, and maintenance changes risk.

Success may be graded rather than binary. A network may remain connected but deliver poor latency, insufficient capacity, or degraded quality.

For these reasons, graph reliability is best used as one layer of analysis. It gives a precise mathematical model of structural survival, but it must be interpreted with the system context.

51.20 Summary

Network reliability studies the probability that a graph remains functional after random failures.

Concept	Meaning
Edge reliability	Edges work or fail probabilistically
Vertex reliability	Vertices work or fail probabilistically
All-terminal reliability	The whole graph remains connected
Two-terminal reliability	Two specified vertices remain connected
Reliability polynomial	Probability of connectivity as a function of $p$
Minimum cut	Small failure set that disconnects the network
Monte Carlo estimation	Simulation-based reliability approximation

Connectivity gives worst-case guarantees. Reliability gives probability estimates. Together they describe how a graph behaves under failure, both structurally and statistically.