# Chapter 77. Cholesky Decomposition

# Chapter 77. Cholesky Decomposition

Cholesky decomposition is a triangular factorization for symmetric positive definite matrices. It is a specialized form of matrix decomposition, more restrictive than LU decomposition, but faster and simpler when its hypotheses are satisfied.

For a real symmetric positive definite matrix \(A\), the Cholesky decomposition has the form

$$
A = LL^T,
$$

where \(L\) is lower triangular with positive diagonal entries. For a complex Hermitian positive definite matrix, the corresponding form is

$$
A = LL^*,
$$

where \(L^*\) denotes the conjugate transpose of \(L\). Every Hermitian positive definite matrix has such a decomposition, and the factor is unique when the diagonal entries of \(L\) are required to be positive.

## 77.1 Symmetric Positive Definite Matrices

A real square matrix \(A\) is symmetric if

$$
A^T = A.
$$

It is positive definite if

$$
x^T A x > 0
$$

for every nonzero vector \(x\).

Positive definiteness is a strong condition. It says that the quadratic form associated with \(A\) is always positive away from the origin. This condition appears naturally in geometry, optimization, statistics, least squares, energy methods, covariance matrices, and numerical partial differential equations.

For Cholesky decomposition, symmetry alone is insufficient. Positive definiteness is also required. A symmetric matrix with negative or zero curvature in some direction may fail to have a standard Cholesky factorization.

## 77.2 The Factorization

The Cholesky factorization of a real symmetric positive definite matrix is

$$
A = LL^T.
$$

Here

$$
L =
\begin{bmatrix}
\ell_{11} & 0 & 0 & \cdots & 0 \\
\ell_{21} & \ell_{22} & 0 & \cdots & 0 \\
\ell_{31} & \ell_{32} & \ell_{33} & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\ell_{n1} & \ell_{n2} & \ell_{n3} & \cdots & \ell_{nn}
\end{bmatrix},
$$

with

$$
\ell_{ii} > 0
$$

for every \(i\).

The transpose \(L^T\) is upper triangular. Thus Cholesky writes \(A\) as the product of a lower triangular matrix and its transpose.

This is different from ordinary LU decomposition. In LU decomposition, the two triangular factors are generally unrelated:

$$
A = LU.
$$

In Cholesky decomposition, the upper factor is exactly the transpose of the lower factor:

$$
U = L^T.
$$

This symmetry is the source of Cholesky's efficiency.

## 77.3 Relation to LU Decomposition

Cholesky decomposition may be viewed as LU decomposition adapted to symmetric positive definite matrices.

If

$$
A = LL^T,
$$

then \(A\) is automatically symmetric:

$$
A^T = (LL^T)^T = LL^T = A.
$$

It is also positive definite when \(L\) is invertible. Indeed, for any nonzero vector \(x\),

$$
x^T A x =
x^T LL^T x =
(L^T x)^T(L^T x) =
\|L^T x\|^2.
$$

Since \(L\) is invertible, \(L^T x \ne 0\) whenever \(x \ne 0\). Hence

$$
x^T A x > 0.
$$

Thus every matrix of the form \(LL^T\), with \(L\) nonsingular, is symmetric positive definite.

The converse is the Cholesky theorem: every real symmetric positive definite matrix can be written in this way.

## 77.4 A Two by Two Example

Let

$$
A =
\begin{bmatrix}
4 & 6 \\
6 & 13
\end{bmatrix}.
$$

We seek

$$
A = LL^T,
$$

where

$$
L =
\begin{bmatrix}
a & 0 \\
b & c
\end{bmatrix},
\qquad
a > 0,
\qquad
c > 0.
$$

Then

$$
LL^T =
\begin{bmatrix}
a & 0 \\
b & c
\end{bmatrix}
\begin{bmatrix}
a & b \\
0 & c
\end{bmatrix} =
\begin{bmatrix}
a^2 & ab \\
ab & b^2 + c^2
\end{bmatrix}.
$$

Equating entries with \(A\), we get

$$
a^2 = 4,
$$

so

$$
a = 2.
$$

Next,

$$
ab = 6,
$$

so

$$
2b = 6,
$$

and therefore

$$
b = 3.
$$

Finally,

$$
b^2 + c^2 = 13.
$$

Thus

$$
9 + c^2 = 13,
$$

so

$$
c = 2.
$$

Therefore

$$
L =
\begin{bmatrix}
2 & 0 \\
3 & 2
\end{bmatrix}.
$$

Indeed,

$$
\begin{bmatrix}
2 & 0 \\
3 & 2
\end{bmatrix}
\begin{bmatrix}
2 & 3 \\
0 & 2
\end{bmatrix} =
\begin{bmatrix}
4 & 6 \\
6 & 13
\end{bmatrix}.
$$

## 77.5 Entry Formulas

Let \(A\) be an \(n \times n\) real symmetric positive definite matrix. Suppose

$$
A = LL^T.
$$

The entries of \(L\) can be computed column by column.

For a diagonal entry,

$$
\ell_{jj} =
\sqrt{
a_{jj} -
\sum_{k=1}^{j-1} \ell_{jk}^2
}.
$$

For an entry below the diagonal, where \(i > j\),

$$
\ell_{ij} =
\frac{
a_{ij} -
\sum_{k=1}^{j-1} \ell_{ik}\ell_{jk}
}{
\ell_{jj}
}.
$$

These formulas depend only on entries of \(L\) that have already been computed. The square root is well-defined because positive definiteness guarantees that the quantity under the radical is positive in exact arithmetic.

## 77.6 A Three by Three Example

Let

$$
A =
\begin{bmatrix}
4 & 12 & -16 \\
12 & 37 & -43 \\
-16 & -43 & 98
\end{bmatrix}.
$$

This standard example has Cholesky factor

$$
L =
\begin{bmatrix}
2 & 0 & 0 \\
6 & 1 & 0 \\
-8 & 5 & 3
\end{bmatrix}.
$$

Check the product:

$$
LL^T =
\begin{bmatrix}
2 & 0 & 0 \\
6 & 1 & 0 \\
-8 & 5 & 3
\end{bmatrix}
\begin{bmatrix}
2 & 6 & -8 \\
0 & 1 & 5 \\
0 & 0 & 3
\end{bmatrix}.
$$

The entries are

$$
(LL^T)_{11} = 2^2 = 4,
$$

$$
(LL^T)_{12} = 2 \cdot 6 = 12,
$$

$$
(LL^T)_{13} = 2 \cdot (-8) = -16,
$$

$$
(LL^T)_{22} = 6^2 + 1^2 = 37,
$$

$$
(LL^T)_{23} = 6(-8) + 1(5) = -48 + 5 = -43,
$$

$$
(LL^T)_{33} = (-8)^2 + 5^2 + 3^2 = 64 + 25 + 9 = 98.
$$

Thus

$$
LL^T = A.
$$

## 77.7 Solving Linear Systems

Suppose

$$
Ax = b
$$

and

$$
A = LL^T.
$$

Then

$$
LL^T x = b.
$$

Introduce

$$
y = L^T x.
$$

Then solve the two triangular systems

$$
Ly = b
$$

and

$$
L^T x = y.
$$

The first system is solved by forward substitution. The second system is solved by backward substitution.

The solution procedure is therefore:

| Step | Operation |
|---|---|
| 1 | Factor \(A = LL^T\) |
| 2 | Solve \(Ly = b\) |
| 3 | Solve \(L^T x = y\) |

This is analogous to solving by LU decomposition, but the two factors are transposes of one another.

## 77.8 Example: Solving a System

Let

$$
A =
\begin{bmatrix}
4 & 6 \\
6 & 13
\end{bmatrix},
\qquad
b =
\begin{bmatrix}
10 \\
19
\end{bmatrix}.
$$

From the previous example,

$$
A = LL^T,
$$

where

$$
L =
\begin{bmatrix}
2 & 0 \\
3 & 2
\end{bmatrix}.
$$

First solve

$$
Ly = b.
$$

That is,

$$
\begin{bmatrix}
2 & 0 \\
3 & 2
\end{bmatrix}
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix} =
\begin{bmatrix}
10 \\
19
\end{bmatrix}.
$$

The first equation gives

$$
2y_1 = 10,
$$

so

$$
y_1 = 5.
$$

The second equation gives

$$
3y_1 + 2y_2 = 19.
$$

Substitute \(y_1=5\):

$$
15 + 2y_2 = 19,
$$

so

$$
y_2 = 2.
$$

Now solve

$$
L^T x = y.
$$

That is,

$$
\begin{bmatrix}
2 & 3 \\
0 & 2
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix} =
\begin{bmatrix}
5 \\
2
\end{bmatrix}.
$$

The second equation gives

$$
2x_2 = 2,
$$

so

$$
x_2 = 1.
$$

The first equation gives

$$
2x_1 + 3x_2 = 5.
$$

Therefore

$$
2x_1 + 3 = 5,
$$

so

$$
x_1 = 1.
$$

Hence

$$
x =
\begin{bmatrix}
1 \\
1
\end{bmatrix}.
$$

## 77.9 Determinants from Cholesky

Cholesky decomposition gives a simple determinant formula.

If

$$
A = LL^T,
$$

then

$$
\det(A) =
\det(L)\det(L^T).
$$

Since

$$
\det(L^T)=\det(L),
$$

we have

$$
\det(A) = \det(L)^2.
$$

Because \(L\) is triangular,

$$
\det(L)=\prod_{i=1}^{n}\ell_{ii}.
$$

Therefore

$$
\det(A) =
\left(\prod_{i=1}^{n}\ell_{ii}\right)^2.
$$

Equivalently,

$$
\det(A) =
\prod_{i=1}^{n}\ell_{ii}^2.
$$

This formula is useful in statistics, optimization, Gaussian models, and numerical methods, especially when the logarithm of the determinant is needed:

$$
\log \det(A) =
2 \sum_{i=1}^{n} \log \ell_{ii}.
$$

## 77.10 Positive Definiteness Test

Cholesky decomposition can be used as a test for positive definiteness.

When the algorithm attempts to compute

$$
\ell_{jj} =
\sqrt{
a_{jj} -
\sum_{k=1}^{j-1} \ell_{jk}^2
},
$$

the expression under the square root must be positive.

If, in exact arithmetic, this expression becomes zero or negative before the factorization is complete, then \(A\) is not positive definite.

Thus Cholesky factorization is both a decomposition method and a diagnostic. For a symmetric matrix, successful Cholesky factorization with positive diagonal entries certifies positive definiteness.

In floating point arithmetic, care is needed. A nearly positive definite or ill-conditioned matrix may produce small negative values because of rounding error. In such cases, the failure may reflect numerical sensitivity rather than a clear mathematical failure.

## 77.11 Operation Count

For a dense \(n \times n\) matrix, Cholesky decomposition requires about

$$
\frac{1}{3}n^3
$$

arithmetic operations.

Dense LU decomposition requires about

$$
\frac{2}{3}n^3
$$

operations.

Thus Cholesky is roughly twice as efficient as LU when it applies. This efficiency comes from exploiting symmetry and using only one triangular factor.

Solving the two triangular systems after factorization costs on the order of

$$
2n^2
$$

operations for one right-hand side.

## 77.12 Cholesky and Least Squares

Cholesky decomposition often appears in least squares problems through the normal equations.

Given an overdetermined system

$$
Ax \approx b,
$$

the least squares solution satisfies

$$
A^T A x = A^T b.
$$

If \(A\) has full column rank, then \(A^T A\) is symmetric positive definite. Therefore one may compute

$$
A^T A = LL^T
$$

and solve

$$
LL^T x = A^T b.
$$

This method is simple, but it can be numerically less stable than QR decomposition because forming \(A^T A\) squares the condition number. Cholesky is therefore efficient, but not always the preferred method for least squares when numerical accuracy is critical.

## 77.13 Cholesky and Covariance Matrices

Covariance matrices are symmetric positive semidefinite, and positive definite when the variables have no exact linear dependence.

If

$$
\Sigma = LL^T
$$

is the Cholesky factorization of a covariance matrix, then \(L\) can be used to transform uncorrelated standard variables into correlated variables.

Let

$$
z \sim N(0,I).
$$

Define

$$
x = Lz.
$$

Then

$$
\operatorname{Cov}(x) =
\operatorname{Cov}(Lz) =
L \operatorname{Cov}(z) L^T =
LIL^T =
LL^T =
\Sigma.
$$

This is one reason Cholesky decomposition is widely used in statistics and simulation. It gives a direct way to generate random vectors with a prescribed covariance matrix.

## 77.14 Complex Cholesky Decomposition

For complex matrices, the proper analogue of symmetry is Hermitian symmetry.

A complex matrix \(A\) is Hermitian if

$$
A^* = A.
$$

It is positive definite if

$$
x^* A x > 0
$$

for every nonzero complex vector \(x\).

The Cholesky factorization is

$$
A = LL^*.
$$

The diagonal entries of \(L\) are real and positive. The factor \(L^*\) is the conjugate transpose of \(L\), not merely the transpose.

For real matrices, conjugation has no effect, so this reduces to

$$
A = LL^T.
$$

## 77.15 LDL Transpose Decomposition

A related factorization is

$$
A = LDL^T,
$$

where \(L\) is unit lower triangular and \(D\) is diagonal.

This form avoids square roots. It separates the diagonal scaling into the matrix \(D\). For a positive definite matrix, the diagonal entries of \(D\) are positive.

If

$$
D =
\operatorname{diag}(d_1,\ldots,d_n)
$$

with \(d_i>0\), then

$$
D = D^{1/2}D^{1/2}.
$$

Hence

$$
A = LDL^T =
(LD^{1/2})(LD^{1/2})^T.
$$

This recovers an ordinary Cholesky factor with

$$
C = LD^{1/2}.
$$

The \(LDL^T\) form is often useful in theoretical work and in numerical algorithms where square roots are undesirable.

## 77.16 Algorithm

The following algorithm computes the lower triangular Cholesky factor of a real symmetric positive definite matrix.

```text
cholesky_decomposition(A):
    n = number of rows of A
    L = zero matrix of size n by n

    for j = 1 to n:
        s = 0
        for k = 1 to j - 1:
            s = s + L[j,k] * L[j,k]

        d = A[j,j] - s

        if d <= 0:
            stop: matrix is not positive definite

        L[j,j] = sqrt(d)

        for i = j + 1 to n:
            s = 0
            for k = 1 to j - 1:
                s = s + L[i,k] * L[j,k]

            L[i,j] = (A[i,j] - s) / L[j,j]

    return L
```

The algorithm computes one column of \(L\) at a time. Since \(A\) is symmetric, only the lower triangular part of \(A\) is needed.

## 77.17 Storage in Practice

A dense Cholesky implementation usually stores the result in the same array that originally stored \(A\).

Since \(A\) is symmetric, only one triangular half must be stored. The lower triangular half may be overwritten by \(L\). The upper triangular half is either unused or treated as implicit.

| Stored object | Practical representation |
|---|---|
| \(A\) | Lower or upper triangular half |
| \(L\) | Same triangular half after factorization |
| \(L^T\) | Not stored separately |
| Symmetric entries | Implied by symmetry |

This storage model cuts memory use and improves cache behavior.

## 77.18 Failure Modes

Cholesky decomposition can fail for several reasons.

| Cause | Meaning |
|---|---|
| Matrix is not symmetric | The factorization \(A=LL^T\) cannot represent it |
| Matrix is indefinite | Some \(x^T A x\) is negative |
| Matrix is singular positive semidefinite | A zero pivot appears |
| Matrix is ill-conditioned | Floating point roundoff may create a small negative pivot |
| Matrix has modeling error | Intended covariance or Hessian is not positive definite |

A common numerical repair is to add a small diagonal term:

$$
A_\lambda = A + \lambda I,
\qquad
\lambda > 0.
$$

This operation is called diagonal regularization or jitter in statistical computing. It can make a nearly positive definite matrix safely positive definite, at the cost of changing the original problem.

## 77.19 Comparison with LU and QR

| Decomposition | Main form | Applies to | Typical use |
|---|---|---|---|
| LU | \(PA = LU\) | General square matrices | Linear systems |
| Cholesky | \(A = LL^T\) | Symmetric positive definite matrices | Fast SPD solves |
| QR | \(A = QR\) | Rectangular or square matrices | Least squares, orthogonality |
| SVD | \(A = U\Sigma V^T\) | General matrices | Rank, conditioning, approximation |

Cholesky is the preferred factorization when the matrix is known to be symmetric positive definite. LU is more general. QR is more stable for many least squares problems. SVD is the most informative but usually more expensive.

## 77.20 Summary

Cholesky decomposition factors a real symmetric positive definite matrix as

$$
A = LL^T.
$$

For complex Hermitian positive definite matrices, the form is

$$
A = LL^*.
$$

The factor \(L\) is lower triangular with positive diagonal entries. The factorization is unique under that convention. It is used to solve linear systems, compute determinants, test positive definiteness, generate correlated random variables, and exploit the structure of covariance and Hessian matrices.

Cholesky is faster and more memory-efficient than general LU decomposition when its assumptions are satisfied. Its limitation is also its strength: it applies only to matrices with the symmetry and positivity needed to make the triangular square-root structure valid.