Probabilistic circuits are tractable probabilistic models built from simple computational graphs.
Probabilistic circuits are tractable probabilistic models built from simple computational graphs. They represent probability distributions using a network of sum, product, and leaf nodes. The main goal is to keep probabilistic inference efficient while still allowing expressive models.
They are also known under related names such as sum-product networks, arithmetic circuits, and tractable probabilistic models. The exact terminology depends on the architecture and the literature, but the central idea is the same: represent a probability distribution as a circuit whose structure makes inference computationally tractable.
Probabilistic circuits occupy a different point in the generative modeling landscape from VAEs, GANs, flows, and diffusion models. They usually trade some neural-network flexibility for exact or efficient probabilistic computation.
Motivation
Many probabilistic models define useful distributions but make inference expensive. For example, in a general latent-variable model, computing a marginal probability may require summing or integrating over many hidden configurations.
A probabilistic circuit is designed so that common operations are efficient:
| Operation | Meaning |
|---|---|
| Marginal inference | Compute probability while leaving some variables unobserved |
| Conditional inference | Compute probability after observing evidence |
| Sampling | Generate data from the model |
| Most probable explanation | Find a high-probability assignment |
| Likelihood evaluation | Compute exactly or efficiently |
The price is structural discipline. The circuit must obey certain constraints, such as decomposability and smoothness, so that these operations remain tractable.
Variables and Distributions
Let
be a collection of random variables. A probabilistic circuit defines a joint distribution
For one observed example
the circuit computes a scalar value
Unlike an energy-based model, this value is usually normalized by construction. Unlike a neural autoregressive model, it does not have to factor variables in a fixed left-to-right order.
Circuit Structure
A probabilistic circuit is a directed acyclic graph. It usually contains three kinds of nodes.
| Node type | Function |
|---|---|
| Leaf node | Represents a simple distribution over one or more variables |
| Product node | Multiplies child distributions |
| Sum node | Forms a weighted mixture of child distributions |
The output node computes the probability of the full observation.
A simple circuit may compute
This expression describes a mixture of two factorized distributions. The first mixture component has weight , and the second has weight .
Leaf Nodes
Leaf nodes define simple local distributions. Examples include:
| Variable type | Possible leaf distribution |
|---|---|
| Binary | Bernoulli |
| Categorical | Categorical distribution |
| Continuous | Gaussian |
| Count-valued | Poisson |
| Bounded continuous | Beta distribution |
For a binary variable , a Bernoulli leaf may compute
For a continuous variable, a Gaussian leaf may compute
Leaf nodes are intentionally simple. Expressiveness comes from composing many simple leaves through sums and products.
Product Nodes
A product node multiplies the values of its children:
Product nodes represent factorization. If two child nodes depend on disjoint sets of variables, their product defines a joint distribution over the union of those variables.
For example, if one child models and another child models , the product node computes
This is valid when the circuit treats those variables as independent within that substructure.
Sum Nodes
A sum node computes a weighted sum of child values:
where
Sum nodes represent mixtures. Each child corresponds to one alternative explanation or component.
A sum node can be interpreted as introducing a latent categorical variable. If the sum node has three children, the latent variable chooses one of three submodels.
Thus, sum nodes give probabilistic circuits latent-variable expressiveness without requiring expensive general inference.
Scope
The scope of a node is the set of variables that appear below that node.
For example:
- a leaf over has scope ,
- a product of leaves over and has scope ,
- a sum over two children that both model has scope .
Scope is central because tractability depends on how scopes combine through the circuit.
Decomposability
A product node is decomposable if its children have disjoint scopes.
For a product node with children , decomposability requires
This prevents the same variable from being multiplied against itself through different children.
For example, this product is decomposable:
This product is not decomposable:
Decomposability is one of the main reasons probabilistic circuits support efficient marginal inference.
Smoothness
A sum node is smooth if all of its children have the same scope.
For a sum node with children , smoothness requires
This ensures that the sum mixes distributions over the same variables.
For example, this sum is smooth:
This sum is not smooth:
Smoothness makes marginalization and likelihood evaluation well-defined throughout the circuit.
Normalization
If the leaves are normalized distributions, sum weights are nonnegative and sum to one, product nodes are decomposable, and sum nodes are smooth, then the whole circuit can define a normalized probability distribution.
This is the key advantage of probabilistic circuits.
Many neural generative models define expressive functions but make exact normalization hard. Probabilistic circuits enforce structure so normalization can be maintained locally and composed globally.
Marginal Inference
Suppose we observe only some variables. Let
We may want
In a probabilistic circuit, this can often be computed by replacing missing-variable leaves with 1 and evaluating the circuit normally.
For example, if a Bernoulli leaf for is unobserved, then
The rest of the circuit propagates this marginalization efficiently.
This gives probabilistic circuits a major advantage for missing-data problems.
Conditional Inference
Conditional probabilities can be computed from joint and marginal probabilities:
Because probabilistic circuits can compute both numerator and denominator efficiently, conditional inference is usually tractable.
This is useful for:
- imputation,
- classification,
- anomaly detection,
- structured prediction,
- uncertainty-aware systems.
Sampling
Sampling proceeds recursively.
At a leaf node, sample from the leaf distribution.
At a product node, sample independently from all children.
At a sum node, choose one child according to the mixture weights, then sample from that child.
This gives an efficient ancestral sampling procedure.
Unlike MCMC sampling in energy-based models, sampling from many probabilistic circuits does not require a long Markov chain.
Learning Parameters
The parameters of a probabilistic circuit include:
- leaf distribution parameters,
- sum-node weights,
- sometimes structure parameters.
If the structure is fixed, parameter learning is often performed by maximum likelihood.
Given data
we maximize
Gradient-based optimization can be used, and for some circuit classes expectation-maximization is also possible.
Learning Structure
Learning the structure of a probabilistic circuit is harder than learning parameters.
Structure learning decides:
- which variables are grouped together,
- where products are placed,
- where mixtures are placed,
- how many components are used,
- what leaf distributions appear.
Several approaches exist:
| Approach | Description |
|---|---|
| Top-down partitioning | Recursively split variables and data |
| Clustering-based methods | Use data clusters to define sum nodes |
| Random structures | Sample valid structures and train parameters |
| Neural parameterization | Use neural networks inside or around the circuit |
| Hybrid models | Combine circuits with deep feature extractors |
The structure determines the tradeoff between expressiveness and tractability.
Sum-Product Networks
Sum-product networks, or SPNs, are a widely studied form of probabilistic circuit.
An SPN is a directed acyclic graph with sum nodes, product nodes, and tractable leaves. Under the right structural conditions, SPNs support efficient exact inference.
SPNs became influential because they showed that deep probabilistic models could maintain tractable inference through architectural constraints.
They can be seen as deep mixture models with repeated factorization and recombination.
Arithmetic Circuits
Arithmetic circuits are closely related. They represent probability computations as arithmetic expressions.
For example, a Bayesian network can sometimes be compiled into an arithmetic circuit. Once compiled, repeated inference queries can be answered efficiently.
This gives a useful distinction:
| Model type | Main role |
|---|---|
| Bayesian network | Human-readable graphical model |
| Arithmetic circuit | Efficient compiled inference form |
| Probabilistic circuit | Learnable tractable distribution model |
The boundaries between these categories are not always strict.
Neural Probabilistic Circuits
Modern work sometimes combines probabilistic circuits with neural networks.
A neural network may produce:
- leaf parameters,
- mixture weights,
- embeddings used by the circuit,
- conditional distributions.
This hybrid approach tries to combine:
- representation power from neural networks,
- tractable inference from circuits.
For example, a CNN may encode an image into features, and a probabilistic circuit may model uncertainty over structured labels.
Comparison with Flow-Based Models
| Probabilistic circuits | Flow-based models |
|---|---|
| Tractability through graph constraints | Tractability through invertibility |
| Efficient marginal inference | Exact likelihood and inverse mapping |
| Handles missing variables naturally | Usually expects full observations |
| Often discrete or mixed-variable friendly | Strong for continuous high-dimensional data |
| Structure learning is important | Architecture design is important |
Both families aim for tractable probability computation, but they achieve it through different structural principles.
Comparison with Energy-Based Models
| Probabilistic circuits | Energy-based models |
|---|---|
| Usually normalized by construction | Often unnormalized |
| Efficient marginal inference | Marginal inference often hard |
| Sampling can be direct | Sampling often MCMC-based |
| Less flexible without large structures | Very flexible energy functions |
| Structural constraints are central | Energy design and sampling are central |
Probabilistic circuits prioritize tractability. Energy-based models prioritize flexibility.
Comparison with Autoregressive Models
Autoregressive models factor the joint distribution as
They give exact likelihoods but impose an ordering over variables.
Probabilistic circuits do not require a single fixed sequential factorization. Their structure can encode mixtures and decompositions over variable groups.
| Probabilistic circuits | Autoregressive models |
|---|---|
| Structured factorization | Sequential factorization |
| Efficient marginal queries | Marginals often require summation or sampling |
| Good missing-data support | Missing-data handling can be awkward |
| Strong tractability guarantees | Strong sequence modeling scalability |
Autoregressive transformers scale better for language and many modern generative tasks. Probabilistic circuits remain attractive when exact marginal and conditional inference are important.
PyTorch Sketch
A simple probabilistic circuit can be implemented as a differentiable computation graph.
Below is a small sum-product circuit for binary variables.
import torch
from torch import nn
class BernoulliLeaf(nn.Module):
def __init__(self, dim: int):
super().__init__()
self.logits = nn.Parameter(torch.zeros(dim))
def forward(self, x: torch.Tensor) -> torch.Tensor:
probs = torch.sigmoid(self.logits)
log_p = x * torch.log(probs + 1e-8)
log_p += (1 - x) * torch.log(1 - probs + 1e-8)
return log_p.sum(dim=-1)
class SumNode(nn.Module):
def __init__(self, children: list[nn.Module]):
super().__init__()
self.children = nn.ModuleList(children)
self.logits = nn.Parameter(torch.zeros(len(children)))
def forward(self, x: torch.Tensor) -> torch.Tensor:
child_log_probs = torch.stack(
[child(x) for child in self.children],
dim=-1,
)
log_weights = torch.log_softmax(self.logits, dim=-1)
return torch.logsumexp(child_log_probs + log_weights, dim=-1)This example uses log-space computation for numerical stability. In real probabilistic circuits, product nodes would combine disjoint scopes rather than always evaluating every leaf on the full vector.
A training loss is simply negative log-likelihood:
def nll(model: nn.Module, x: torch.Tensor) -> torch.Tensor:
return -model(x).mean()This illustrates a key advantage: if the circuit is normalized by construction, likelihood training is direct.
Advantages
Probabilistic circuits have several strengths.
They support exact or efficient likelihood computation. They handle missing values naturally. They provide tractable marginal and conditional inference. They can represent mixtures and latent structure. They work well for domains where uncertainty, missing data, and structured inference matter.
They are especially useful when the question is not only “generate a sample” but also “answer probability queries reliably.”
Limitations
Probabilistic circuits also have limitations.
Their structural constraints can limit expressiveness. Very large circuits may be needed for complex high-dimensional data. Structure learning can be difficult. Neural alternatives often scale better for images, language, and audio. Hybrid models are promising but can be harder to design and analyze.
For this reason, probabilistic circuits are less dominant than transformers or diffusion models in mainstream deep learning. Their value is strongest in tractable probabilistic reasoning and uncertainty-aware modeling.
Role in Deep Learning
Probabilistic circuits provide a useful counterpoint to modern black-box neural models.
They show that tractable inference can be designed into the model architecture rather than approximated after the fact. This matters for applications where we need calibrated probabilities, missing-data inference, or reliable conditional reasoning.
In a deep learning book, probabilistic circuits are useful because they clarify a basic tradeoff:
- unrestricted neural models are flexible but often intractable probabilistically,
- tractable probabilistic models are efficient but structurally constrained,
- hybrid systems try to combine the two.
Summary
Probabilistic circuits are tractable probabilistic models represented as computational graphs of leaves, sums, and products. Leaf nodes define simple distributions. Product nodes factor variables. Sum nodes form mixtures.
With structural conditions such as decomposability and smoothness, these circuits support efficient likelihood evaluation, marginal inference, conditional inference, and sampling.
They are less common than transformers, diffusion models, and VAEs in modern large-scale generative modeling, but they remain important for exact probabilistic reasoning, missing-data inference, and the design of neural-symbolic or uncertainty-aware systems.