Deep learning systems manipulate tensors with millions or billions of numerical entries.
Deep learning systems manipulate tensors with millions or billions of numerical entries. Understanding the shape, dimensional structure, and memory organization of tensors is essential for building efficient neural networks in PyTorch.
Many deep learning errors arise from incorrect tensor shapes rather than incorrect mathematics. Likewise, many performance problems arise from inefficient memory layouts or unnecessary tensor copies. A strong understanding of tensor structure therefore affects both correctness and computational efficiency.
Tensor Shape
The shape of a tensor describes the size of each axis.
If a tensor has shape
then it has two axes. The first axis has size 3 and the second axis has size 4.
For example:
import torch
X = torch.randn(3, 4)
print(X.shape)Output:
torch.Size([3, 4])This tensor contains
entries.
A tensor with shape
contains
entries.
In general, if a tensor has shape
then the total number of elements is
PyTorch provides the total number of elements through numel():
X = torch.randn(2, 3, 4)
print(X.numel())Output:
24Tensor Dimensions
The number of axes in a tensor is called its dimension, rank, or order.
| Tensor | Shape example | Number of dimensions |
|---|---|---|
| Scalar | [] | 0 |
| Vector | [5] | 1 |
| Matrix | [3, 4] | 2 |
| 3D tensor | [2, 3, 4] | 3 |
| 4D tensor | [32, 3, 224, 224] | 4 |
In PyTorch:
X = torch.randn(32, 3, 224, 224)
print(X.ndim)Output:
4The term “4D tensor” means that the tensor has four axes, not that it represents physical four-dimensional space.
Semantic Meaning of Axes
The axes of a tensor usually carry semantic meaning.
A batch of RGB images commonly uses shape
where:
| Symbol | Meaning |
|---|---|
| Batch size | |
| Number of channels | |
| Image height | |
| Image width |
For example:
images = torch.randn(32, 3, 224, 224)This tensor represents:
- 32 images
- 3 color channels
- height 224
- width 224
Similarly, transformer models often use
where:
| Symbol | Meaning |
|---|---|
| Batch size | |
| Sequence length | |
| Embedding dimension |
Example:
tokens = torch.randn(16, 128, 768)This may represent:
- 16 sequences
- 128 tokens per sequence
- 768-dimensional embeddings
Tensor programming requires tracking both the numerical shape and the meaning of each axis.
Reshaping Tensors
Reshaping changes the view of tensor data without changing the underlying entries.
Suppose a tensor has shape
Since it contains 24 entries, it can be reshaped into any compatible shape whose dimensions multiply to 24.
Example:
X = torch.randn(2, 3, 4)
Y = X.reshape(6, 4)
print(Y.shape)Output:
torch.Size([6, 4])The entries remain the same. Only the interpretation changes.
PyTorch allows automatic dimension inference with -1:
X = torch.randn(2, 3, 4)
Y = X.reshape(2, -1)
print(Y.shape)Output:
torch.Size([2, 12])PyTorch inferred the missing dimension automatically.
Flattening
Flattening converts multiple axes into one axis.
This operation is common before fully connected layers.
Example:
X = torch.randn(32, 3, 224, 224)
Y = X.flatten(start_dim=1)
print(Y.shape)Output:
torch.Size([32, 150528])The batch axis is preserved while the remaining dimensions are collapsed.
Mathematically,
Adding and Removing Dimensions
PyTorch provides operations for inserting or removing singleton dimensions.
A singleton dimension has size 1.
Unsqueeze
unsqueeze() inserts a new axis.
x = torch.randn(64)
print(x.shape)
x = x.unsqueeze(0)
print(x.shape)Output:
torch.Size([64])
torch.Size([1, 64])This operation is common when converting a single example into a batch.
Squeeze
squeeze() removes axes of size 1.
x = torch.randn(1, 64, 1)
print(x.shape)
x = x.squeeze()
print(x.shape)Output:
torch.Size([1, 64, 1])
torch.Size([64])Permuting Axes
Sometimes tensor axes must be reordered.
PyTorch uses permute() for this purpose.
Example:
X = torch.randn(32, 224, 224, 3)
Y = X.permute(0, 3, 1, 2)
print(Y.shape)Output:
torch.Size([32, 3, 224, 224])The tensor originally used NHWC layout:
After permutation it uses NCHW layout:
PyTorch convolution layers typically expect channel-first tensors.
Tensor Memory Layout
Tensor shape describes logical structure. Memory layout describes physical storage in memory.
A tensor may appear multidimensional while its entries are stored linearly in memory.
Consider:
X = torch.tensor([
[1, 2, 3],
[4, 5, 6]
])Logically:
Physically, memory stores:
[1, 2, 3, 4, 5, 6]Most PyTorch tensors use row-major layout, meaning rows are stored contiguously.
Memory layout affects performance because modern hardware reads contiguous memory more efficiently.
Strides
A stride tells PyTorch how many memory positions must be skipped to move along each axis.
Example:
X = torch.randn(3, 4)
print(X.stride())Possible output:
(4, 1)Interpretation:
- moving along axis 0 skips 4 entries
- moving along axis 1 skips 1 entry
Strides allow PyTorch to create tensor views without copying memory.
Contiguous and Noncontiguous Tensors
Some operations produce noncontiguous tensors.
Example:
X = torch.randn(2, 3)
Y = X.t()
print(Y.is_contiguous())Output:
FalseThe transpose changes tensor interpretation without rearranging memory.
Some PyTorch operations require contiguous tensors. In such cases:
Y = Y.contiguous()This creates a contiguous copy in memory.
Understanding contiguity becomes important in high-performance systems and custom CUDA kernels.
Views Versus Copies
Many tensor operations create views rather than copies.
A view shares the same underlying storage.
Example:
X = torch.arange(12)
Y = X.reshape(3, 4)
Y[0, 0] = -1
print(X)Output:
tensor([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])Changing Y also changed X because both tensors share storage.
This behavior improves efficiency but can produce subtle bugs.
Broadcasting and Shape Expansion
Broadcasting allows operations between tensors of compatible shapes.
Example:
X = torch.randn(32, 64)
b = torch.randn(64)
Y = X + bPyTorch conceptually expands b from shape
to
This expansion usually occurs without allocating new memory.
Broadcasting rules compare dimensions from right to left.
Dimensions are compatible if:
- they are equal, or
- one of them is 1
Examples:
| Shape A | Shape B | Result |
|---|---|---|
[32, 64] | [64] | [32, 64] |
[32, 10, 64] | [64] | [32, 10, 64] |
[32, 10, 64] | [1, 64] | [32, 10, 64] |
[32, 10, 64] | [32] | Invalid |
Broadcasting is one of the most important tensor operations in deep learning.
Tensor Layouts in Deep Learning
Different applications use different tensor layouts.
Images
Computer vision commonly uses:
Text
Transformers often use:
Audio
Waveforms may use:
Spectrograms may use:
Video
Video models often use:
Graphs
Node features often use:
Edge indices may use:
Understanding these conventions is necessary for reading modern research papers and implementing architectures correctly.
Shape Transformations in Neural Networks
Neural networks continuously transform tensor shapes.
A convolutional network might perform:
| Layer | Input shape | Output shape |
|---|---|---|
| Input | [32, 3, 224, 224] | [32, 3, 224, 224] |
| Conv2D | [32, 3, 224, 224] | [32, 64, 112, 112] |
| Pooling | [32, 64, 112, 112] | [32, 64, 56, 56] |
| Flatten | [32, 64, 56, 56] | [32, 200704] |
| Linear | [32, 200704] | [32, 1000] |
A transformer may perform:
| Layer | Input shape | Output shape |
|---|---|---|
| Token IDs | [16, 128] | [16, 128] |
| Embedding | [16, 128] | [16, 128, 768] |
| Attention | [16, 128, 768] | [16, 128, 768] |
| Output logits | [16, 128, 50000] | [16, 128, 50000] |
Much of practical deep learning consists of reasoning about these transformations correctly.
Shape Errors and Debugging
Shape mismatches are among the most common PyTorch errors.
Example:
A = torch.randn(4, 3)
B = torch.randn(5, 2)
C = A @ BThis produces an error because matrix multiplication requires matching inner dimensions.
Valid multiplication requires:
Then:
A good debugging practice is to print tensor shapes at each stage:
print(X.shape)Experienced PyTorch programmers mentally track tensor shapes throughout the model.
Summary
Tensor shape describes the size of each axis. Tensor dimension describes the number of axes. Tensor layout describes how entries are stored in memory.
PyTorch provides operations for reshaping, flattening, permuting, broadcasting, and slicing tensors. These operations often create views rather than copies, allowing efficient computation.
Modern deep learning systems are fundamentally tensor transformation systems. Understanding how shapes propagate through a model is therefore one of the core skills in neural network engineering.