# Hybrid Symbolic-Numeric Systems

## Hybrid Symbolic-Numeric Systems

A hybrid symbolic-numeric system combines discrete symbolic reasoning with continuous numerical computation. In the context of automatic differentiation, it means a pipeline where some components operate over symbols, rules, logic, programs, graphs, or queries, while other components operate over tensors, vectors, probabilities, or differentiable functions.

The basic form is:

$$
x \rightarrow S(x) \rightarrow N(S(x)) \rightarrow L
$$

where $S$ is a symbolic transformation, $N$ is a numerical or differentiable computation, and $L$ is an objective.

The challenge is that symbolic systems usually contain hard decisions. Automatic differentiation works best over smooth numerical maps. Hybrid systems need a boundary where symbolic structure and gradient-based optimization can cooperate.

### Why Hybrid Systems Matter

Purely numerical systems are flexible but often weak at structure. Purely symbolic systems are precise but brittle under uncertainty.

A hybrid system can combine:

| Symbolic Strength | Numeric Strength |
|---|---|
| Exact rules | Learning from data |
| Type constraints | Smooth optimization |
| Program structure | Statistical generalization |
| Logical validity | Robust approximation |
| Database-style relations | Vector similarity |
| Search and planning | Gradient descent |

This is useful in systems that need both correctness and adaptation.

Examples include:

| Domain | Hybrid Structure |
|---|---|
| Program synthesis | Symbolic grammar plus learned scoring |
| Databases | SQL operators plus vector ranking |
| Robotics | Task planner plus differentiable controller |
| Theorem proving | Formal logic plus neural guidance |
| Scientific computing | PDE structure plus learned closure models |
| Retrieval systems | Inverted index plus dense embeddings |
| Compilers | Rewrite rules plus learned cost models |

### Symbolic Components

Symbolic components operate on structured objects.

Examples:

```text
SQL query
abstract syntax tree
proof term
program IR
knowledge graph
constraint system
logical formula
rewrite rule
```

Their operations are often exact:

```text
parse -> type check -> rewrite -> execute
```

These transformations usually produce discontinuities. A small input change can produce a different parse tree, plan, branch, proof step, or query result.

### Numeric Components

Numeric components operate on continuous representations.

Examples:

```text
embedding vector
tensor state
probability distribution
score function
neural network output
differentiable simulator state
```

Their operations support gradients:

$$
y = f_\theta(x)
$$

The derivative:

$$
\frac{\partial L}{\partial \theta}
$$

allows optimization from task loss.

### The Boundary Problem

The central design problem is the symbolic-numeric boundary.

A symbolic operation may look like:

$$
z = \operatorname{select}(x)
$$

where the output changes abruptly when the selected symbol changes.

AD cannot propagate useful ordinary derivatives through this hard selection.

Hybrid systems therefore need one of several boundary designs:

| Boundary Design | Method |
|---|---|
| Embed symbols | Map symbols into vectors |
| Relax choices | Replace hard decisions with probabilities |
| Score candidates | Keep symbolic candidates, learn ranking |
| Differentiate inside leaves | Symbolic structure controls differentiable modules |
| Use custom gradients | Define surrogate backward behavior |
| Use policy gradients | Treat symbolic choices as actions |
| Use implicit differentiation | Differentiate solution conditions |

The right design depends on where learning should occur.

### Embedding Symbols

The most common bridge maps symbols into vectors:

$$
s \mapsto e_s \in \mathbb{R}^d
$$

This allows symbolic objects to enter numerical models.

For a sequence of symbols:

$$
(s_1, s_2, \ldots, s_n)
$$

the system produces:

$$
(e_{s_1}, e_{s_2}, \ldots, e_{s_n})
$$

Embeddings allow similarity, clustering, and gradient updates.

The cost is loss of exact identity. Two distinct symbols may become close in vector space even when symbolic rules require them to remain separate.

### Relaxing Symbolic Choices

A hard symbolic choice:

```text
choose one rule from {r1, r2, r3}
```

can be relaxed into a distribution:

$$
p_i = P(r_i \mid x)
$$

The output becomes a weighted combination:

$$
y = \sum_i p_i f_i(x)
$$

This creates a differentiable mixture of symbolic alternatives.

During inference, the system may choose:

$$
\arg\max_i p_i
$$

The training system is smooth. The deployed system may remain discrete.

### Candidate Generation and Learned Scoring

Many practical systems keep symbolic generation exact and make scoring differentiable.

Example:

```text
symbolic engine generates candidates
neural model scores candidates
best candidate is selected
```

For program synthesis:

```text
grammar -> candidate programs -> learned ranker -> execution
```

For theorem proving:

```text
proof state -> legal tactics -> learned tactic score -> next proof state
```

This design preserves symbolic validity because candidates are generated by rules. The learned system only prioritizes them.

### Differentiable Leaves in Symbolic Trees

A symbolic expression can contain differentiable leaves.

Example:

$$
E = \operatorname{if}(c)\ f_\theta(x)\ \operatorname{else}\ g_\phi(x)
$$

The branch condition is symbolic or discrete. Inside each branch, computation is differentiable.

AD can compute gradients for the executed branch. It usually cannot explain how to change the branch condition unless the condition is relaxed or given a custom gradient.

This pattern appears in:

- decision trees with neural leaves
- rule systems with learned scoring functions
- query engines with differentiable operators
- planners with learned controllers

### Typed Hybrid Systems

Types are especially important in hybrid systems.

A type system can state which values are symbolic and which are differentiable:

```text
Expr        symbolic expression
Tensor[T]   differentiable tensor
Prob[A]     distribution over symbolic values
Param[T]    trainable parameter
```

This prevents invalid operations such as taking gradients of a raw syntax tree.

A typed boundary might require explicit conversion:

```text
embed : Symbol -> Tensor[float]
sample : Prob[A] -> A
relax : A -> Prob[A]
```

The type system documents where differentiability is valid.

### Differentiable Logic

Logical predicates are discrete:

$$
P(x) \in \{0,1\}
$$

A differentiable relaxation maps predicates to soft truth values:

$$
P(x) \in [0,1]
$$

Logical connectives can then be approximated continuously.

For example:

| Logic | Soft Approximation |
|---|---|
| $A \land B$ | $A \cdot B$ or $\min(A,B)$ |
| $A \lor B$ | $A + B - AB$ or $\max(A,B)$ |
| $\neg A$ | $1 - A$ |
| $A \Rightarrow B$ | $1 - A + AB$ |

This enables gradients through logical constraints.

The tradeoff is semantic drift. Soft logic can violate properties expected from classical logic.

### Differentiable Constraints

Many hybrid systems express constraints:

$$
g(x) = 0
$$

or:

$$
h(x) \le 0
$$

These constraints may come from symbolic rules but apply to numerical variables.

A common objective is:

$$
L = L_{\text{task}} + \lambda L_{\text{constraint}}
$$

where $L_{\text{constraint}}$ penalizes violations.

This allows symbolic knowledge to shape gradient-based learning.

Examples:

| Constraint | Penalty |
|---|---|
| Conservation law | Mass or energy error |
| Type rule | Invalid assignment penalty |
| Geometric rule | Distance violation |
| Logical implication | Soft truth violation |
| Database integrity | Key or relation penalty |

### Program Synthesis

Program synthesis is naturally hybrid.

The program space is symbolic and discrete:

```text
grammar -> syntax tree -> executable program
```

The scoring may be numerical:

$$
s_\theta(P, x)
$$

The objective may combine correctness, cost, and probability:

$$
L(P) =
L_{\text{examples}}(P)
+
\lambda \cdot \text{cost}(P) -
\log p_\theta(P)
$$

A neural model can guide search while a symbolic executor checks correctness.

This avoids relying on a neural model to invent valid syntax from scratch.

### Query Systems

Modern query systems increasingly combine symbolic and numeric execution.

Example:

```sql
SELECT doc_id
FROM documents
WHERE language = 'en'
ORDER BY vector_similarity(embedding, :query)
LIMIT 10;
```

The language filter is symbolic and exact. The vector ranking is numeric and differentiable.

This hybrid structure is often preferable to making the whole query engine continuous.

Exact filters preserve constraints. Dense scoring improves semantic matching.

### Theorem Proving

Formal proof systems require exact validity.

A proof step is either accepted or rejected by the kernel.

A hybrid theorem prover can use a neural model to guide tactic selection:

```text
proof state
  -> legal tactics
  -> learned ranking
  -> symbolic verifier
```

The verifier remains symbolic. The learned model improves search efficiency.

This separation protects correctness while still using data-driven guidance.

### Robotics and Planning

Robotics often separates high-level planning from low-level control.

```text
symbolic planner
  -> task sequence
  -> differentiable controller
  -> physical trajectory
```

The planner may choose symbolic actions such as:

```text
pick cup
move to table
place cup
```

The controller optimizes continuous parameters:

- joint trajectories
- forces
- grasp pose
- timing

The symbolic layer handles task structure. The numeric layer handles motion and physics.

### Compilers

A compiler can use symbolic rules for correctness and learned models for cost.

```text
IR graph
  -> legal rewrites
  -> learned cost model
  -> selected optimized program
```

The rewrite system preserves semantics. The learned cost model predicts performance.

This is a robust hybrid design because correctness is separated from optimization quality.

### Failure Modes

Hybrid symbolic-numeric systems fail in distinct ways.

| Failure | Cause |
|---|---|
| Semantic drift | Soft relaxation no longer matches symbolic meaning |
| Invalid gradients | Surrogate gradient optimizes the wrong objective |
| Search explosion | Symbolic candidate space grows too large |
| Weak credit assignment | Final loss poorly guides early symbolic choices |
| Overconfident scoring | Learned ranker suppresses valid candidates |
| Boundary mismatch | Training uses soft choices, inference uses hard choices |
| Type confusion | Numeric code treats symbolic identity as continuous |
| Constraint conflict | Soft penalties compete with task loss |

These systems require explicit tests at the boundary.

### Design Principles

A good hybrid system should keep exactness where exactness matters.

Use symbolic machinery for:

- validity
- syntax
- safety constraints
- type checking
- transactional guarantees
- proof checking
- legal execution paths

Use numeric machinery for:

- ranking
- approximation
- perception
- scoring
- control
- uncertain matching
- optimization

Do not make a component differentiable merely because it is possible. Make it differentiable when gradients provide useful credit assignment.

### Relation to Automatic Differentiation

Automatic differentiation applies to the numerical subgraph. The symbolic part defines structure, choices, and constraints around that subgraph.

A hybrid AD system must know:

| Question | Reason |
|---|---|
| Which values are differentiable? | Avoid invalid gradients |
| Which operations are symbolic? | Preserve exact semantics |
| Which choices are relaxed? | Define trainable boundaries |
| Which constraints are hard? | Preserve correctness |
| Which constraints are soft? | Allow optimization |

The AD runtime should not silently pretend that all computation is smooth.

### Core Idea

Hybrid symbolic-numeric systems combine exact structure with trainable continuous computation. Symbolic components provide rules, types, programs, queries, and constraints. Numeric components provide gradients, approximation, ranking, and adaptation.

The main design task is choosing the boundary. A well-designed boundary lets automatic differentiation improve the parts of the system that benefit from optimization while preserving the symbolic invariants that make the system correct.