Skip to content

Chapter 138. Tensor Decompositions

138.1 Introduction

Tensor decompositions generalize matrix decompositions to multidimensional arrays.

A matrix has two modes: rows and columns. A tensor may have three, four, or more modes. For example, an image dataset may have modes for height, width, color channel, and image index. A recommender system may have modes for user, item, and time. A knowledge graph may have modes for subject, relation, and object.

Tensor decompositions express a large tensor using smaller factors. They are used for compression, denoising, latent factor modeling, feature extraction, and multidimensional data analysis. The two classical models are CP decomposition and Tucker decomposition, often viewed as higher-order analogues of matrix factorization methods such as SVD and PCA.

138.2 Tensors

A tensor is a multidimensional array.

A vector is a first-order tensor:

xi. x_i.

A matrix is a second-order tensor:

Aij. A_{ij}.

A third-order tensor has entries

Xijk. \mathcal{X}_{ijk}.

More generally, an NN-th order tensor has entries

Xi1i2iN. \mathcal{X}_{i_1 i_2 \cdots i_N}.

The number of indices is called the order of the tensor. Each index direction is called a mode.

For example, if

XRI×J×K, \mathcal{X}\in \mathbb{R}^{I\times J\times K},

then X\mathcal{X} is a third-order tensor with three modes of sizes II, JJ, and KK.

138.3 Why Tensors Are Different from Matrices

Matrices already support many powerful decompositions:

Matrix decompositionPurpose
LUSolving linear systems
QROrthogonalization and least squares
EigendecompositionSpectral analysis
SVDLow-rank approximation

Tensors do not inherit all matrix properties directly.

For matrices, rank is well behaved. Every matrix has a best rank-rr approximation under the Frobenius norm, given by truncated SVD.

For tensors, rank is more complicated. A best low-rank approximation may fail to exist in some models. Rank can be hard to compute. Decompositions may be unique in some cases and ill-conditioned in others.

Thus tensor decompositions are not merely matrix decompositions with more indices. They form a distinct theory.

138.4 Fibers and Slices

A fiber is obtained by fixing all tensor indices except one.

For a third-order tensor

Xijk, \mathcal{X}_{ijk},

examples include:

Fiber typeFixed indicesVarying index
Mode-1 fiberj,kj,k fixedii
Mode-2 fiberi,ki,k fixedjj
Mode-3 fiberi,ji,j fixedkk

A slice is obtained by fixing one index.

For a third-order tensor:

Slice typeFixed indexResult
Frontal slicekk fixedMatrix in i,ji,j
Lateral slicejj fixedMatrix in i,ki,k
Horizontal sliceii fixedMatrix in j,kj,k

Fibers generalize rows and columns. Slices generalize matrix subarrays.

138.5 Tensor Unfolding

Tensor unfolding, also called matricization or flattening, converts a tensor into a matrix.

For a third-order tensor

XRI×J×K, \mathcal{X}\in\mathbb{R}^{I\times J\times K},

the mode-1 unfolding is a matrix

X(1)RI×JK. X_{(1)}\in\mathbb{R}^{I\times JK}.

It arranges all mode-1 fibers as columns.

Similarly:

X(2)RJ×IK, X_{(2)}\in\mathbb{R}^{J\times IK},

and

X(3)RK×IJ. X_{(3)}\in\mathbb{R}^{K\times IJ}.

Unfolding is important because many tensor algorithms reduce parts of the problem to matrix operations. It allows SVD, least squares, and rank computations to be applied mode by mode.

138.6 Mode Products

The mode-nn product multiplies a tensor by a matrix along one mode.

Let

XRI1××IN \mathcal{X}\in\mathbb{R}^{I_1\times\cdots\times I_N}

and let

URJ×In. U\in\mathbb{R}^{J\times I_n}.

The mode-nn product is written

Y=X×nU. \mathcal{Y} = \mathcal{X}\times_n U.

The resulting tensor has size

I1××In1×J×In+1××IN. I_1\times\cdots\times I_{n-1}\times J\times I_{n+1}\times\cdots\times I_N.

In coordinates,

Yi1in1jin+1iN=in=1InXi1iniNUjin. \mathcal{Y}_{i_1\cdots i_{n-1}j i_{n+1}\cdots i_N} = \sum_{i_n=1}^{I_n} \mathcal{X}_{i_1\cdots i_n\cdots i_N}U_{j i_n}.

Mode products are the basic algebraic operation behind Tucker decompositions and multilinear transformations.

138.7 Rank-One Tensors

A rank-one tensor is an outer product of vectors.

For vectors

aRI,bRJ,cRK, a\in\mathbb{R}^I,\qquad b\in\mathbb{R}^J,\qquad c\in\mathbb{R}^K,

their outer product is the tensor

abc a\circ b\circ c

with entries

(abc)ijk=aibjck. (a\circ b\circ c)_{ijk}=a_i b_j c_k.

This generalizes the rank-one matrix

abT. ab^T.

Rank-one tensors are the building blocks of CP decomposition.

138.8 CP Decomposition

The CP decomposition writes a tensor as a sum of rank-one tensors.

For a third-order tensor,

Xr=1Rλrarbrcr. \mathcal{X} \approx \sum_{r=1}^{R} \lambda_r a_r\circ b_r\circ c_r.

Here:

SymbolMeaning
RRNumber of components
λr\lambda_rComponent weight
ar,br,cra_r,b_r,c_rFactor vectors

The CP model is also called CANDECOMP/PARAFAC.

In coordinates,

Xijkr=1Rλrairbjrckr. \mathcal{X}_{ijk} \approx \sum_{r=1}^{R} \lambda_r a_{ir}b_{jr}c_{kr}.

The factor vectors are often collected into factor matrices:

A=[a1  aR],B=[b1  bR],C=[c1  cR]. A=[a_1\ \cdots\ a_R], \quad B=[b_1\ \cdots\ b_R], \quad C=[c_1\ \cdots\ c_R].

CP decomposition is useful when one wants a small number of latent components that jointly explain all modes.

138.9 Tensor Rank

The CP rank of a tensor is the smallest number RR such that

X=r=1Rarbrcr. \mathcal{X} = \sum_{r=1}^{R} a_r\circ b_r\circ c_r.

This generalizes matrix rank.

However, tensor rank is much more difficult than matrix rank.

For matrices, rank can be computed by Gaussian elimination or SVD. For tensors, determining rank is generally hard. Tensor rank also depends more subtly on the scalar field. A tensor may have different rank over R\mathbb{R} and over C\mathbb{C}.

The complexity of tensor rank is one reason tensor decomposition theory is technically difficult.

138.10 CP Uniqueness

CP decomposition can be unique under conditions where matrix factorization would not be.

This is one of the attractive features of CP models. A matrix factorization

XABT X\approx AB^T

has a rotational ambiguity:

ABT=(AQ)(BQT)T AB^T=(AQ)(BQ^{-T})^T

for any invertible matrix QQ.

CP decompositions can avoid this ambiguity because three or more modes constrain the components more strongly.

Uniqueness is usually meant up to:

  1. Permutation of components,
  2. Rescaling of factors within each component.

For example,

arbrcr=(αar)(βbr)(γcr) a_r\circ b_r\circ c_r = (\alpha a_r)\circ(\beta b_r)\circ(\gamma c_r)

whenever

αβγ=1. \alpha\beta\gamma=1.

138.11 Tucker Decomposition

The Tucker decomposition writes a tensor as a smaller core tensor multiplied by factor matrices along each mode.

For a third-order tensor,

XG×1A×2B×3C. \mathcal{X} \approx \mathcal{G} \times_1 A \times_2 B \times_3 C.

Here:

SymbolMeaning
G\mathcal{G}Core tensor
A,B,CA,B,CFactor matrices
×n\times_nMode-nn product

In coordinates,

Xijkp=1Pq=1Qr=1RGpqrAipBjqCkr. \mathcal{X}_{ijk} \approx \sum_{p=1}^{P} \sum_{q=1}^{Q} \sum_{r=1}^{R} \mathcal{G}_{pqr} A_{ip}B_{jq}C_{kr}.

The Tucker model separates mode-specific bases from the interactions among components. The core tensor records how components from different modes interact. Tucker decomposition is commonly described as a multilinear generalization of PCA, with a core tensor and factor matrices for the modes.

138.12 Tucker Rank

The Tucker rank of a tensor is the tuple of ranks of its unfoldings.

For a third-order tensor,

rankTucker(X)=(r1,r2,r3), \operatorname{rank}_{\mathrm{Tucker}}(\mathcal{X}) = (r_1,r_2,r_3),

where

rn=rank(X(n)). r_n=\operatorname{rank}(X_{(n)}).

Unlike CP rank, Tucker rank is a tuple rather than a single number.

This reflects the fact that each mode may have a different intrinsic dimension.

For example, an image dataset may require many spatial components but fewer lighting or viewpoint components.

138.13 CP versus Tucker

CP and Tucker decompositions serve different purposes.

FeatureCP decompositionTucker decomposition
Basic formSum of rank-one tensorsCore tensor times factor matrices
Rank parameterSingle integer RRTuple (r1,,rN)(r_1,\ldots,r_N)
Core tensorDiagonal or implicitFull smaller tensor
InterpretabilityComponents often directly interpretableFactors plus interactions
FlexibilityMore constrainedMore flexible
CompressionStrong when CP rank is smallStrong when multilinear rank is small

CP is often preferred when one wants identifiable latent components. Tucker is often preferred when one wants compression, dimensional reduction, or mode-wise bases.

138.14 Higher-Order SVD

The higher-order singular value decomposition, or HOSVD, is a Tucker-type decomposition computed from SVDs of tensor unfoldings.

For each mode nn, compute an SVD of

X(n). X_{(n)}.

Use the leading left singular vectors as the columns of the factor matrix

U(n). U^{(n)}.

Then define the core by projecting the tensor onto these mode bases:

G=X×1(U(1))T×2(U(2))T×N(U(N))T. \mathcal{G} = \mathcal{X} \times_1 (U^{(1)})^T \times_2 (U^{(2)})^T \cdots \times_N (U^{(N)})^T.

The approximation is then

XG×1U(1)×2U(2)×NU(N). \mathcal{X} \approx \mathcal{G} \times_1 U^{(1)} \times_2 U^{(2)} \cdots \times_N U^{(N)}.

HOSVD is a direct and stable way to compute Tucker approximations.

138.15 Higher-Order Orthogonal Iteration

Higher-order orthogonal iteration, or HOOI, improves a Tucker approximation iteratively.

The method alternates over modes. For each mode, it fixes all other factor matrices and updates the current factor matrix using an SVD of a projected unfolding.

This is analogous to alternating least squares.

HOOI usually starts from HOSVD and then refines the factor matrices.

It is often more accurate than plain HOSVD for a fixed multilinear rank, but it is iterative and may converge to a local optimum.

138.16 Alternating Least Squares

Alternating least squares, or ALS, is a standard method for computing tensor decompositions.

For CP decomposition, ALS fixes all factor matrices except one, then solves a least squares problem for the remaining factor matrix. It cycles through the modes until convergence.

For a third-order CP model:

  1. Fix BB and CC, solve for AA,
  2. Fix AA and CC, solve for BB,
  3. Fix AA and BB, solve for CC,
  4. Repeat.

ALS is simple and widely used, but it can be slow, sensitive to initialization, and affected by degeneracy.

Recent algorithms also use randomized sketching to reduce the cost of least squares subproblems for large tensors.

138.17 Tensor Train Decomposition

Tensor train decomposition represents a high-order tensor as a chain of smaller core tensors.

For an NN-th order tensor,

Xi1iN=Gi1(1)Gi2(2)GiN(N), \mathcal{X}_{i_1\cdots i_N} = G^{(1)}_{i_1} G^{(2)}_{i_2} \cdots G^{(N)}_{i_N},

where the product is matrix multiplication over hidden rank indices.

More explicitly, each core has size

rn1×In×rn, r_{n-1}\times I_n\times r_n,

with boundary ranks

r0=rN=1. r_0=r_N=1.

Tensor trains are useful for very high-order tensors, where CP and Tucker may become too large.

Tensor train methods are implemented in modern tensor libraries, including SVD-based and cross-approximation variants.

138.18 Hierarchical Tucker Decomposition

Hierarchical Tucker decomposition organizes modes in a tree.

Instead of using one large core tensor as in Tucker decomposition, it recursively groups modes and represents interactions through smaller transfer tensors.

This avoids the exponential growth of the Tucker core in high order.

Hierarchical tensor formats are important in scientific computing, numerical PDEs, quantum many-body simulation, and high-dimensional approximation.

They trade a single global core for a hierarchy of smaller cores.

138.19 Nonnegative Tensor Factorization

Nonnegative tensor factorization imposes nonnegativity constraints on the factors.

If

Xijk0, \mathcal{X}_{ijk}\ge0,

one may seek a decomposition with

A0,B0,C0. A\ge0,\qquad B\ge0,\qquad C\ge0.

This is useful when the data represents counts, intensities, probabilities, or other nonnegative quantities.

Examples include:

Data typeInterpretation
Documents by words by timeTopics over time
Users by items by contextsRecommender patterns
Images by pixels by conditionsParts-based features
Brain signalsNonnegative activations

Nonnegativity often improves interpretability.

138.20 Symmetric Tensor Decomposition

A symmetric tensor is invariant under permutations of its indices.

For example,

Xijk=Xjik=Xkij \mathcal{X}_{ijk} = \mathcal{X}_{jik} = \mathcal{X}_{kij}

for all index permutations.

A symmetric CP decomposition has the form

Xr=1Rλrararar. \mathcal{X} \approx \sum_{r=1}^R \lambda_r a_r\circ a_r\circ a_r.

Symmetric tensors arise from homogeneous polynomials, moments, cumulants, and higher-order derivatives.

For example, a cubic form

p(x)=ijkXijkxixjxk p(x)=\sum_{ijk}\mathcal{X}_{ijk}x_i x_j x_k

corresponds to a symmetric third-order tensor.

138.21 Tensor Completion

Tensor completion estimates missing entries of a tensor.

Suppose only some entries of

X \mathcal{X}

are observed. The goal is to recover the full tensor by assuming low-rank structure.

This generalizes matrix completion.

Applications include:

ApplicationMissing entries
Recommender systemsUser-item-time ratings
Sensor networksMissing measurements
Image and videoDamaged pixels or frames
Medical dataIncomplete patient measurements
Knowledge graphsMissing relations

Tensor completion uses the idea that multidimensional structure can constrain missing values more strongly than matrix structure alone.

138.22 Applications

Tensor decompositions appear in many areas.

AreaTensor modes
ChemometricsSample, wavelength, time
Recommender systemsUser, item, context
Computer visionImage, illumination, pose, identity
NeuroscienceChannel, time, trial
Natural language processingWord, context, document
Knowledge graphsSubject, relation, object
Scientific computingSpatial and parameter dimensions

The key advantage is that tensors preserve multiway structure. Flattening a tensor into a matrix can destroy relationships among modes.

138.23 Numerical Issues

Tensor decomposition algorithms face several numerical issues.

IssueDescription
InitializationDifferent starts can give different solutions
Local minimaObjective functions are nonconvex
DegeneracyComponents may diverge while approximation improves
Scaling ambiguityFactors can be rescaled without changing the tensor
Permutation ambiguityComponents can be reordered
Rank selectionCorrect rank may be unknown
Missing dataObserved entries may be sparse or biased

These issues make diagnostics important. One usually checks reconstruction error, stability across runs, factor interpretability, and sensitivity to rank.

138.24 Relation to Linear Algebra

Tensor decompositions extend matrix decompositions but do not simply copy them.

Matrix conceptTensor analogue
Matrix rankCP rank, Tucker rank
SVDHOSVD, Tucker decomposition
Rank-one matrixRank-one tensor
Low-rank approximationCP, Tucker, tensor train approximation
Matrix completionTensor completion
PCAMultilinear PCA
Matrix factorizationMultiway factorization

The central new feature is interaction among more than two modes. This creates richer models and more difficult mathematics.

138.25 Summary

Tensor decompositions express multidimensional arrays using smaller structured factors.

The main concepts are:

ConceptMeaning
TensorMultidimensional array
ModeIndex direction
FiberOne-dimensional slice
SliceLower-dimensional section
UnfoldingMatrix view of a tensor
Mode productMultiplication along one mode
Rank-one tensorOuter product of vectors
CP decompositionSum of rank-one tensors
Tucker decompositionCore tensor times factor matrices
Tucker rankRank tuple of tensor unfoldings
HOSVDSVD-based Tucker decomposition
ALSAlternating least squares fitting
Tensor trainChain of small tensor cores
Tensor completionRecovery from missing entries

Tensor decompositions provide the multilinear extension of matrix factorization. They preserve multiway structure and give tools for compression, latent modeling, denoising, completion, and high-dimensional numerical computation.