Network reliability studies whether a network continues to function when some of its parts fail.
In graph theory, a network is modeled as a graph. Vertices may represent routers, stations, cities, servers, processors, or people. Edges may represent links, roads, cables, communication channels, dependencies, or relationships. Reliability asks whether the remaining graph still has the required connectivity after some vertices or edges are removed.
This chapter connects the deterministic language of connectivity with probabilistic models of failure.
51.1 Deterministic and Probabilistic Reliability
The previous chapters treated connectivity deterministically.
For example, if
then at least three edges must be removed before the graph can become disconnected.
This gives a worst-case guarantee. It assumes that failures may occur in the most damaging possible locations.
Network reliability asks a probabilistic question. Instead of assuming a chosen set of failures, we assign probabilities to failures and ask how likely the network is to remain operational. In standard graph reliability models, vertices or edges have probabilities of being operational, and the graph is considered functional if a specified connectivity condition remains true.
Thus there are two complementary viewpoints:
| Viewpoint | Question |
|---|---|
| Connectivity | How many failures can the graph always tolerate? |
| Reliability | What is the probability that the graph survives random failures? |
Both are useful. Connectivity gives structural guarantees. Reliability gives quantitative risk.
51.2 Edge Failure Model
The simplest reliability model assumes that vertices never fail and edges fail independently.
Let
be a connected graph. Each edge is operational with probability and failed with probability
The operational subgraph is obtained by keeping each edge independently with probability .
The all-terminal reliability of is the probability that the operational subgraph remains connected.
It is often written as
Thus
where denotes the random subgraph produced by retaining each edge with probability .
This definition turns connectivity into a probability.
51.3 State Enumeration
A state of the network is a choice of which edges are operational.
If has edges, then there are
possible edge states.
For a state , the edges in are operational, and the edges in have failed.
If all edges work independently with the same probability , then the probability of state is
The network is connected in state precisely when the spanning subgraph
is connected.
Therefore
This formula is exact, but it is usually not computationally efficient. The number of states doubles with every additional edge.
51.4 Reliability Polynomial
The expression is called the reliability polynomial of the graph.
It records the probability that the graph remains operational when edges fail independently with a fixed probability. The reliability polynomial is widely used as a graph invariant in network reliability.
For a graph with vertices and edges, a connected operational subgraph must contain at least
edges. Hence the reliability polynomial has the form
where is the number of connected spanning subgraphs with exactly edges.
The coefficients have a direct meaning:
| Coefficient | Meaning |
|---|---|
| Number of spanning trees | |
| Number of connected spanning subgraphs with edges | |
| Equal to , the full graph |
This gives a bridge between reliability, spanning trees, and graph enumeration.
51.5 Example: A Path
Let be a path with vertices.
It has
edges. If any one edge fails, the path becomes disconnected. Therefore all edges must be operational.
Thus
A path is minimally connected. It has no redundant edges. Its reliability decreases quickly as the number of edges grows.
For example, if , then
At , this gives
Even though each edge is reliable individually, the whole path is less reliable because every edge is essential.
51.6 Example: A Cycle
Let be a cycle with vertices.
The cycle remains connected if at least
of its edges are operational. It becomes disconnected only if at least two edges fail.
Therefore
The first term corresponds to the state in which all edges work. The second term corresponds to the states in which exactly one edge fails.
This may be simplified:
Cycles are more reliable than paths with the same number of vertices because they contain one redundant route.
51.7 Two-Terminal Reliability
All-terminal reliability requires the whole graph to remain connected.
Sometimes only two specified vertices matter.
Let . The two-terminal reliability between and is the probability that and remain connected in the operational subgraph.
It is written as
Thus
This model is useful when the network must preserve communication between a source and a destination, but other vertices may be disconnected without causing failure.
There are also intermediate models, such as -terminal reliability, where a specified subset must remain mutually connected.
51.8 Vertex Reliability
In the vertex failure model, edges remain available, but vertices fail independently.
Each vertex is operational with probability . The operational subgraph is induced by the operational vertices.
One may then ask whether the remaining graph is connected, or whether specified terminals remain connected.
Vertex reliability is useful when vertices represent machines, routers, substations, people, processors, or facilities. Edge reliability is useful when edges represent physical or logical links.
Many real systems require both models. A communication network may fail because routers fail, links fail, or both.
51.9 Mixed Reliability
In mixed reliability, both vertices and edges may fail.
Each vertex has an operational probability , and each edge has an operational probability . The operational network contains only working vertices and working edges whose endpoints are also working.
The reliability question then depends on the chosen success condition.
Typical success conditions include:
| Reliability type | Success condition |
|---|---|
| All-terminal | All operational or required vertices remain connected |
| Two-terminal | A specified pair remains connected |
| -terminal | All vertices in a terminal set remain connected |
| -connected reliability | The operational graph remains -connected |
| Flow reliability | Enough capacity remains between terminals |
Mixed models are closer to engineering systems but harder to compute.
51.10 Connectivity as a Lower Bound
Connectivity gives a deterministic lower bound on failure tolerance.
If a graph is -edge-connected, then deleting fewer than edges cannot disconnect it.
Therefore, in an independent edge failure model, the graph can fail only if at least edges fail.
This does not determine the exact reliability, because not every set of failed edges disconnects the graph. But it gives a useful first estimate.
Similarly, if a graph is -vertex-connected, then at least vertices must fail before disconnection is possible.
Thus higher connectivity generally improves reliability, but reliability also depends on the number and arrangement of minimum cuts.
51.11 Minimum Cuts and Failure Probability
Suppose has edge connectivity
Then the most likely way for the graph to fail, when is close to , is often the failure of all edges in a minimum cut of size .
If there are many minimum cuts, the graph may be less reliable than another graph with the same value of but fewer minimum cuts.
Thus reliability depends on more than the minimum cut size.
It depends on:
| Factor | Effect |
|---|---|
| Edge connectivity | Minimum number of edge failures needed |
| Number of minimum cuts | Number of most vulnerable failure modes |
| Larger cut structure | Failure probability beyond leading terms |
| Edge probabilities | Nonuniform risk across the network |
| Network topology | Redundancy and bottlenecks |
A graph with the same connectivity can have different reliability because its weak cuts may be arranged differently.
51.12 Reliability and Menger’s Theorem
Menger’s theorem gives a path interpretation of reliability.
If there are edge-disjoint paths between two vertices and , then no set of fewer than edge failures can separate them.
Thus edge-disjoint paths provide route redundancy.
Similarly, internally vertex-disjoint paths provide protection against vertex failures.
In network design, this gives a concrete rule:
To improve reliability between two terminals, provide many independent routes between them.
However, independence in graph theory and independence in probability are different ideas. Edge-disjoint paths do not share edges, but their failures may still be statistically dependent in a real system. For example, two cables may be edge-disjoint in a logical graph but run through the same physical conduit.
Graph reliability models are useful only when the graph abstraction matches the failure mechanisms.
51.13 Series and Parallel Structures
Simple reliability calculations often reduce to series and parallel structures.
In a series connection, all components are required. If any component fails, the system fails.
For independent components with operational probabilities
the series reliability is
A path is a series system.
In a parallel connection, at least one component must work. If components fail independently, the parallel reliability is
Parallel edges or independent routes create redundancy.
Many reliability computations decompose a graph into combinations of series and parallel parts. General graphs, however, rarely decompose completely in this way.
51.14 Complexity
Exact network reliability is computationally difficult.
The number of possible edge states is exponential in the number of edges. More formally, computing the probability that a random subgraph is connected is called the network reliability problem, and it is #P-hard. The related two-terminal problem asks whether two specified vertices remain connected and is also a standard reliability problem.
This hardness explains why practical methods often use:
| Method | Purpose |
|---|---|
| Deletion-contraction | Exact recursive computation |
| Factoring by cut vertices or bridges | Reduce graph size |
| Monte Carlo simulation | Estimate reliability |
| Bounds | Give certified upper and lower estimates |
| Minimal cut enumeration | Approximate high-reliability behavior |
| Binary decision diagrams | Compactly represent state spaces |
Exact formulas are feasible for small or structured graphs. Large networks usually require approximation or special structure.
51.15 Deletion-Contraction
Deletion-contraction is a basic recurrence for reliability polynomials.
Choose an edge . There are two cases:
- fails.
- works.
If fails, the graph becomes
If works, its endpoints may be contracted for connectivity purposes, giving
When every edge has operational probability , the recurrence has the form
with suitable base cases.
This recurrence is exact. Its difficulty is that repeated deletion and contraction create many subproblems. Without memoization or structure, the computation grows exponentially.
51.16 Monte Carlo Estimation
Monte Carlo simulation estimates reliability by sampling many random network states.
The method is simple:
- Generate a random operational subgraph.
- Test whether the required connectivity condition holds.
- Repeat many times.
- Estimate reliability by the fraction of successful trials.
If trials are run and are successful, then the estimator is
This estimator is unbiased under independent sampling.
Monte Carlo methods are flexible. They can handle large graphs, nonuniform probabilities, vertex failures, edge failures, and mixed models. Their limitation is statistical error, especially when failure events are rare.
For highly reliable systems, naive simulation may need many samples to observe enough failures. Rare-event simulation and importance sampling are used to improve efficiency.
51.17 Design Principles
Graph theory suggests several design principles for reliable networks.
First, avoid bridges. A bridge is a single edge failure that disconnects the graph.
Second, avoid articulation points. An articulation point is a single vertex failure that disconnects the graph.
Third, increase edge-disjoint and vertex-disjoint paths between important terminals.
Fourth, avoid small nontrivial cuts between large regions.
Fifth, consider the number of minimum cuts, not only their size.
Sixth, match the graph model to real failure modes. Two logical links may fail together if they share physical infrastructure.
These principles are structural. They do not replace engineering constraints such as cost, geography, latency, bandwidth, power, and maintenance.
51.18 Example: Comparing Two Designs
Consider two networks on four vertices.
The first is a path:
Its reliability is
The second is a cycle:
Its reliability is
At , the path has reliability
The cycle has reliability
Adding one edge greatly improves reliability because the network can survive any single edge failure.
This example illustrates the value of redundancy. A small increase in edge count can produce a large increase in survival probability.
51.19 Limitations of Graph Reliability
Graph reliability models are abstractions.
They usually assume independent failures, fixed probabilities, and a clear success condition. Real systems may violate all three assumptions.
Failures may be correlated. A storm, power outage, software bug, or shared conduit may disable many components together.
Probabilities may change over time. Components age, loads vary, and maintenance changes risk.
Success may be graded rather than binary. A network may remain connected but deliver poor latency, insufficient capacity, or degraded quality.
For these reasons, graph reliability is best used as one layer of analysis. It gives a precise mathematical model of structural survival, but it must be interpreted with the system context.
51.20 Summary
Network reliability studies the probability that a graph remains functional after random failures.
| Concept | Meaning |
|---|---|
| Edge reliability | Edges work or fail probabilistically |
| Vertex reliability | Vertices work or fail probabilistically |
| All-terminal reliability | The whole graph remains connected |
| Two-terminal reliability | Two specified vertices remain connected |
| Reliability polynomial | Probability of connectivity as a function of |
| Minimum cut | Small failure set that disconnects the network |
| Monte Carlo estimation | Simulation-based reliability approximation |
Connectivity gives worst-case guarantees. Reliability gives probability estimates. Together they describe how a graph behaves under failure, both structurally and statistically.