Skip to content

Robotics and Control

Robotics and control systems interact with the physical world through sensing, estimation, planning, and actuation. Automatic differentiation is important because modern...

Robotics and control systems interact with the physical world through sensing, estimation, planning, and actuation. Automatic differentiation is important because modern control pipelines increasingly depend on optimization, simulation, system identification, and differentiable models.

A robotic system is often represented as a dynamical system:

xt+1=f(xt,ut,θ), x_{t+1}=f(x_t,u_t,\theta),

where:

SymbolMeaning
xtx_tsystem state
utu_tcontrol input
θ\thetaphysical or model parameters

A controller chooses actions

ut=π(xt), u_t = \pi(x_t),

to optimize some objective.

The objective may involve trajectory tracking, energy consumption, stability, collision avoidance, or task completion. AD computes derivatives needed for optimization, planning, estimation, and learning.

Dynamics as Differentiable Programs

A robot simulator is a computational graph:

state
    -> dynamics
    -> integration
    -> contact resolution
    -> sensor model
    -> cost function

Differentiating this graph gives sensitivities of trajectories and objectives with respect to controls, parameters, or initial conditions.

This enables:

  • trajectory optimization,
  • model predictive control,
  • differentiable simulation,
  • policy learning,
  • system identification,
  • calibration,
  • inverse dynamics.

Equations of Motion

Rigid-body dynamics are commonly written as

M(q)q¨+C(q,q˙)q˙+g(q)=τ, M(q)\ddot q + C(q,\dot q)\dot q + g(q)=\tau,

where:

SymbolMeaning
qqgeneralized coordinates
M(q)M(q)mass matrix
C(q,q˙)C(q,\dot q)Coriolis and centrifugal terms
g(q)g(q)gravity
τ\taugeneralized forces

The state is typically

x=(q,q˙). x=(q,\dot q).

A simulator numerically integrates the resulting ODE or DAE.

AD computes derivatives of trajectories or costs with respect to:

  • torques,
  • masses,
  • inertias,
  • geometry,
  • friction coefficients,
  • controller parameters.

Trajectory Optimization

Trajectory optimization searches for a control sequence minimizing a cost:

minu0,,uT1t=0T1(xt,ut)+T(xT), \min_{u_0,\ldots,u_{T-1}} \sum_{t=0}^{T-1} \ell(x_t,u_t) + \ell_T(x_T),

subject to dynamics constraints

xt+1=f(xt,ut). x_{t+1}=f(x_t,u_t).

The gradient of the total objective depends on derivatives propagated through the dynamics.

Reverse-mode AD naturally computes these gradients because the trajectory is a sequential computation graph.

Optimal Control and Adjoint Equations

Continuous-time optimal control uses dynamics

x˙=f(x,u), \dot x=f(x,u),

and cost

J=0T(x(t),u(t))dt+ϕ(x(T)). J = \int_0^T \ell(x(t),u(t)) dt + \phi(x(T)).

Pontryagin’s principle introduces the adjoint state

λ(t). \lambda(t).

The Hamiltonian is

H(x,u,λ)=(x,u)+λf(x,u). H(x,u,\lambda) = \ell(x,u) + \lambda^\top f(x,u).

The adjoint equation is

λ˙=Hx. \dot\lambda = - \frac{\partial H}{\partial x}.

This is mathematically equivalent to reverse-mode differentiation through the trajectory.

The connection between control theory and reverse-mode AD is deep:

Control theoryAD interpretation
Adjoint stateReverse gradient
Costate equationBackward sensitivity propagation
Hamiltonian derivativesLocal Jacobian actions
Shooting methodGradient-based trajectory optimization

Model Predictive Control

Model predictive control repeatedly solves a finite-horizon optimization problem:

  1. estimate current state,
  2. optimize future controls,
  3. apply first action,
  4. repeat.

Each optimization uses system derivatives.

AD is useful because modern MPC systems may contain:

  • nonlinear dynamics,
  • learned components,
  • differentiable constraints,
  • neural cost terms,
  • differentiable collision models.

Instead of deriving gradients manually, the controller differentiates the simulation and objective directly.

Differentiable Simulation

A differentiable simulator exposes derivatives of simulated outcomes with respect to simulation inputs.

Suppose simulation evolves:

xt+1=Φ(xt,ut,θ). x_{t+1}=\Phi(x_t,u_t,\theta).

A differentiable simulator computes:

xTut,xTθ,Lxt. \frac{\partial x_T}{\partial u_t}, \qquad \frac{\partial x_T}{\partial \theta}, \qquad \frac{\partial L}{\partial x_t}.

Applications include:

ApplicationPurpose
Robot designOptimize morphology
Policy learningDifferentiate reward
System identificationFit physical parameters
Sim-to-real transferAdapt simulator parameters
Grasp optimizationOptimize contact behavior
Motion planningOptimize trajectories

Contact Dynamics

Contact is one of the hardest parts of differentiable robotics.

Rigid contact often introduces discontinuities:

v+=R(v), v^+ = R(v^-),

where collisions instantaneously change velocity.

Friction introduces complementarity conditions:

0λnϕ(q)0. 0 \le \lambda_n \perp \phi(q)\ge 0.

These systems are piecewise smooth or non-smooth.

Naive AD through contact solvers may produce:

  • undefined gradients,
  • unstable sensitivities,
  • zero gradients,
  • discontinuous optimization behavior.

Common strategies include:

StrategyIdea
Soft contactReplace hard contact with smooth penalty
Implicit differentiationDifferentiate converged contact solve
Relaxed complementaritySmooth inequality conditions
Hybrid methodsAnalytical contact derivatives

Contact differentiation remains an active research area.

Inverse Kinematics

Inverse kinematics solves for joint angles producing a desired end-effector pose.

If

x=f(q), x = f(q),

then inverse kinematics solves

f(q)=x. f(q)=x^*.

Optimization form:

L(q)=12f(q)x2. L(q)=\frac{1}{2}\|f(q)-x^*\|^2.

The Jacobian

J(q)=fq J(q)=\frac{\partial f}{\partial q}

maps joint velocities to task-space velocities:

x˙=J(q)q˙. \dot x = J(q)\dot q.

AD computes these Jacobians automatically, especially for complex articulated systems.

State Estimation

Robotic systems estimate hidden states from sensor measurements.

A state estimator may combine:

  • inertial sensors,
  • cameras,
  • lidar,
  • wheel encoders,
  • GPS,
  • force sensors.

An optimization-based estimator minimizes residuals:

L(x)=iri(x)2. L(x) = \sum_i \|r_i(x)\|^2.

AD computes Jacobians needed for Gauss-Newton or Levenberg-Marquardt optimization.

This is especially useful in SLAM and visual-inertial odometry, where residual structures are large and sparse.

Differentiable Perception

Modern robotic pipelines often integrate learned perception systems.

Example:

camera image
    -> neural perception model
    -> object pose estimate
    -> planner
    -> controller
    -> robot action

If the pipeline is differentiable end-to-end, gradients can flow from task objectives back into perception modules.

This enables:

  • task-aware perception,
  • differentiable sensor calibration,
  • policy gradients through perception,
  • learned observation models.

System Identification

System identification estimates physical parameters from observed trajectories.

Suppose a simulator predicts

xt(θ). x_t(\theta).

Observed trajectories are

x^t. \hat x_t.

The objective is

L(θ)=txt(θ)x^t2. L(\theta) = \sum_t \|x_t(\theta)-\hat x_t\|^2.

AD computes

θL. \nabla_\theta L.

This allows fitting:

  • masses,
  • friction coefficients,
  • motor constants,
  • damping,
  • actuator delays,
  • aerodynamic parameters.

Differentiable simulation has become a major tool for simulator calibration.

Reinforcement Learning and Control

Many reinforcement learning systems are differentiable control systems.

A policy

ut=πθ(xt) u_t=\pi_\theta(x_t)

interacts with dynamics:

xt+1=f(xt,ut). x_{t+1}=f(x_t,u_t).

The objective is expected return:

J(θ)=E[tr(xt,ut)]. J(\theta) = \mathbb{E} \left[ \sum_t r(x_t,u_t) \right].

If the environment is differentiable, gradients can propagate directly through the dynamics. This often gives lower-variance updates than score-function estimators.

However, real environments contain discontinuities, stochasticity, and unmodeled effects. Pure differentiable control is therefore usually combined with robust or stochastic methods.

Sparse Structure

Robotic systems have structured Jacobians.

Examples:

StructureConsequence
Kinematic treesBlock-sparse derivatives
Local contactsSparse coupling
Sequential dynamicsBanded time structure
Factor graphsSparse estimation systems

Efficient robotics AD systems exploit this sparsity rather than forming dense Jacobians.

Real-Time Constraints

Control systems often run under strict timing constraints.

ApplicationTypical timing
Motor controlmicroseconds to milliseconds
MPCmilliseconds
Flight controlsub-millisecond stability loops
SLAM updatesreal-time sensor rates

Generic AD frameworks may be too slow or memory-heavy.

Production systems often use:

  • custom derivative kernels,
  • ahead-of-time code generation,
  • symbolic simplification,
  • sparse linear algebra,
  • static computational graphs.

The derivative system must fit real-time constraints.

Numerical Stability

Robotics gradients can become unstable because of:

CauseEffect
Chaotic contact sequencesSensitive trajectories
Long horizonsExploding or vanishing gradients
Poor scalingIll-conditioned optimization
Hard constraintsNon-smooth derivatives
Integrator errorGradient mismatch
Solver tolerancesNoisy adjoints

Good differentiable robotics systems carefully define solver semantics and smoothing behavior.

Differentiable Robot Design

Robot morphology itself can become an optimization variable.

Parameters may include:

  • link lengths,
  • masses,
  • actuator placement,
  • joint limits,
  • sensor locations.

A differentiable simulator computes how morphology affects task performance.

Optimization becomes:

minθL(θ), \min_\theta L(\theta),

where θ\theta now defines robot structure rather than controller parameters.

This creates co-design systems where body and controller are optimized jointly.

Practical Architecture

A robust differentiable robotics stack typically separates:

LayerResponsibility
GeometryKinematics and transforms
DynamicsEquations of motion
ContactCollision and friction
IntegrationTime stepping
EstimationSensor fusion and optimization
PlanningTrajectory optimization
ControlPolicy or feedback law
LearningGradient-based adaptation

Each layer should expose well-defined derivative rules.

Failure Modes

Differentiable robotics systems fail in characteristic ways.

Failure modeCause
Exploding trajectory gradientsLong unstable horizons
Zero contact gradientsHard collision thresholds
Simulator mismatchReal-world physics differs
Memory explosionReverse mode through long trajectories
Unstable optimizationIll-conditioned dynamics
Nonphysical learned behaviorWeak constraints
Timing failureAD overhead violates real-time limits

Many practical systems intentionally smooth dynamics or truncate gradients to maintain optimization stability.

Summary

Robotics and control systems are naturally differentiable because they evolve through structured dynamical equations. Automatic differentiation provides gradients for trajectory optimization, control, estimation, simulation, and learning.

The main challenges are contact discontinuities, long-horizon stability, sparse structure, real-time execution, and differentiating through numerical solvers. Effective systems combine AD with optimal control theory, sparse numerical methods, custom solver derivatives, and carefully designed simulation semantics.