# Indexing, Slicing, and Tensor Views

Indexing and slicing select parts of a tensor. These operations are used constantly in PyTorch: selecting batches, cropping images, extracting token positions, applying masks, gathering logits, and rearranging model outputs.

A tensor operation may either create a view or a copy. A view shares storage with the original tensor. A copy owns separate storage. This distinction matters for memory use, performance, and mutation.

### Basic Indexing

A tensor entry is selected by giving one index per axis.

```python id="eo5lq5"
import torch

X = torch.tensor([
    [10, 11, 12],
    [20, 21, 22],
    [30, 31, 32],
])

print(X[0, 0])  # tensor(10)
print(X[1, 2])  # tensor(22)
```

PyTorch uses zero-based indexing. The first row has index 0. The second row has index 1.

For a matrix \(X\), the entry at row \(i\), column \(j\) is

$$
X_{ij}.
$$

In PyTorch this is written as:

```python id="yosnd0"
X[i, j]
```

### Indexing Rows and Columns

Selecting one row removes the row axis.

```python id="hi8ado"
X = torch.tensor([
    [10, 11, 12],
    [20, 21, 22],
    [30, 31, 32],
])

row = X[1]

print(row)
print(row.shape)
```

Output:

```python id="bh39gp"
tensor([20, 21, 22])
torch.Size([3])
```

Selecting one column uses `:` to keep all rows:

```python id="ih0o9s"
col = X[:, 1]

print(col)
print(col.shape)
```

Output:

```python id="d5k4tm"
tensor([11, 21, 31])
torch.Size([3])
```

The colon means “select everything along this axis.”

### Slicing Ranges

A slice selects a range of indices.

```python id="zdk6cy"
X = torch.arange(10)

print(X[2:7])
```

Output:

```python id="0rwp6h"
tensor([2, 3, 4, 5, 6])
```

The start index is included. The stop index is excluded.

General form:

```python id="nbnuk4"
start:stop:step
```

Examples:

```python id="767s7c"
x = torch.arange(10)

print(x[:5])     # first five entries
print(x[5:])     # entries from index 5 onward
print(x[::2])    # every second entry
print(x[::-1])   # may require alternatives in PyTorch versions
```

For reverse order, use `torch.flip`:

```python id="k7cwzl"
x = torch.arange(10)
rev = torch.flip(x, dims=[0])

print(rev)
```

### Slicing Higher-Rank Tensors

For a 4D image batch:

$$
X\in\mathbb{R}^{B\times C\times H\times W}.
$$

The PyTorch shape is:

```python id="mc9xvl"
[B, C, H, W]
```

Example:

```python id="y9v7hz"
X = torch.randn(32, 3, 224, 224)
```

Select the first image:

```python id="nwi17v"
img = X[0]
print(img.shape)  # torch.Size([3, 224, 224])
```

Select all images, first channel:

```python id="f48yco"
red = X[:, 0, :, :]
print(red.shape)  # torch.Size([32, 224, 224])
```

Crop the center region:

```python id="6cyvka"
crop = X[:, :, 56:168, 56:168]
print(crop.shape)  # torch.Size([32, 3, 112, 112])
```

Slicing is the natural way to express spatial crops, token windows, and feature subsets.

### Keeping Dimensions

Integer indexing removes an axis. Slicing with a range preserves it.

```python id="mw6s3b"
X = torch.randn(32, 3, 224, 224)

a = X[0]
b = X[0:1]

print(a.shape)  # torch.Size([3, 224, 224])
print(b.shape)  # torch.Size([1, 3, 224, 224])
```

`X[0]` selects one image and removes the batch axis. `X[0:1]` selects a batch containing one image and keeps the batch axis.

This distinction matters because neural network layers usually expect a batch axis.

### Ellipsis Indexing

The ellipsis `...` means “all omitted axes.”

```python id="slc2qr"
X = torch.randn(32, 3, 224, 224)

last_col = X[..., -1]
print(last_col.shape)
```

Output:

```python id="i27ao6"
torch.Size([32, 3, 224])
```

This is equivalent to:

```python id="nxfse1"
X[:, :, :, -1]
```

Ellipsis is useful when the number of leading axes may vary.

Example:

```python id="tya7jw"
def last_feature(x):
    return x[..., -1]
```

This function works for `[B, D]`, `[B, T, D]`, or `[B, H, W, D]`.

### Boolean Masks

Comparison operations produce Boolean tensors.

```python id="dgkkh4"
x = torch.tensor([-2.0, 0.5, 3.0, -1.0])

mask = x > 0
print(mask)
```

Output:

```python id="si48c6"
tensor([False,  True,  True, False])
```

A Boolean mask can select matching entries:

```python id="3kzr17"
positive = x[mask]
print(positive)
```

Output:

```python id="bk78rl"
tensor([0.5000, 3.0000])
```

Boolean indexing returns a flattened selection when applied this way. It creates a new tensor containing only selected entries.

Masks are common in deep learning. For example, in sequence models, a padding mask identifies which tokens should be ignored.

```python id="fj27m9"
tokens = torch.tensor([
    [101, 2054, 2003, 102, 0, 0],
    [101, 2129, 2024, 2017, 102, 0],
])

pad_id = 0
padding_mask = tokens == pad_id

print(padding_mask)
```

### Masked Assignment

Masks can also modify selected entries.

```python id="12jrac"
x = torch.tensor([-2.0, 0.5, 3.0, -1.0])

x[x < 0] = 0.0

print(x)
```

Output:

```python id="trhhjg"
tensor([0.0000, 0.5000, 3.0000, 0.0000])
```

This operation behaves like an in-place ReLU.

For differentiable model code, prefer functional operations such as `torch.where` unless mutation is clearly intended:

```python id="u2o6bj"
x = torch.tensor([-2.0, 0.5, 3.0, -1.0])

y = torch.where(x < 0, torch.zeros_like(x), x)

print(y)
```

### Advanced Integer Indexing

A tensor of indices can select multiple positions.

```python id="m617s4"
x = torch.tensor([10, 20, 30, 40, 50])

idx = torch.tensor([0, 2, 4])

print(x[idx])
```

Output:

```python id="jxpmfn"
tensor([10, 30, 50])
```

For a matrix:

```python id="oues7o"
X = torch.tensor([
    [10, 11, 12],
    [20, 21, 22],
    [30, 31, 32],
])

rows = torch.tensor([0, 2])
cols = torch.tensor([1, 2])

print(X[rows, cols])
```

Output:

```python id="lcgvk4"
tensor([11, 32])
```

This selects pairs:

$$
X_{0,1},\quad X_{2,2}.
$$

It does not select the rectangular submatrix formed by rows `[0, 2]` and columns `[1, 2]`.

To select a rectangular submatrix, use `torch.ix_`-style broadcasting with index tensors:

```python id="52h41z"
rows = torch.tensor([0, 2])
cols = torch.tensor([1, 2])

sub = X[rows[:, None], cols]

print(sub)
```

Output:

```python id="pjq43t"
tensor([[11, 12],
        [31, 32]])
```

### Gathering Values

`torch.gather` selects values along a specified axis using an index tensor.

A common use is selecting the logit for the correct class.

```python id="r3ga12"
logits = torch.tensor([
    [2.0, 0.1, -1.0],
    [0.3, 1.5, 0.2],
])

labels = torch.tensor([0, 1])
```

The correct class scores are:

```python id="khsle6"
selected = logits[torch.arange(logits.shape[0]), labels]

print(selected)
```

Output:

```python id="xbjqs3"
tensor([2.0000, 1.5000])
```

The same idea with `gather`:

```python id="7q81pu"
selected = logits.gather(dim=1, index=labels[:, None])

print(selected)
```

Output:

```python id="bs0wt9"
tensor([[2.0000],
        [1.5000]])
```

`gather` is useful when writing vectorized code for classification, sequence losses, and beam search.

### Scatter Operations

Scatter operations write values into a tensor at indexed positions.

```python id="0qlorz"
out = torch.zeros(3, 5)

index = torch.tensor([
    [0],
    [2],
    [4],
])

src = torch.tensor([
    [1.0],
    [1.0],
    [1.0],
])

out.scatter_(dim=1, index=index, src=src)

print(out)
```

Output:

```python id="wu1jn6"
tensor([[1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1.]])
```

This pattern creates one-hot encodings. PyTorch also provides `torch.nn.functional.one_hot`.

```python id="sf46uc"
import torch.nn.functional as F

labels = torch.tensor([0, 2, 4])
one_hot = F.one_hot(labels, num_classes=5)

print(one_hot)
```

### Views and Shared Storage

Basic slicing usually creates a view. A view shares memory with the original tensor.

```python id="1hn00g"
x = torch.arange(10)
y = x[2:7]

y[0] = -1

print(x)
```

Output:

```python id="hbbuji"
tensor([ 0,  1, -1,  3,  4,  5,  6,  7,  8,  9])
```

Changing `y` changed `x`.

This behavior avoids unnecessary memory allocation. It also means that mutation through a view can affect the original tensor.

### Copies from Advanced Indexing

Advanced indexing usually creates a copy.

```python id="6rqi1y"
x = torch.arange(10)
idx = torch.tensor([2, 3, 4])

y = x[idx]
y[0] = -1

print(x)
print(y)
```

Output:

```python id="84p8xu"
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([-1,  3,  4])
```

Changing `y` did not change `x` because `y` owns separate storage.

A practical rule:

| Operation type | Usually returns |
|---|---|
| Basic slicing | View |
| `transpose`, `permute` | View with changed strides |
| Boolean indexing | Copy |
| Integer array indexing | Copy |
| `clone()` | Copy |
| `contiguous()` | Copy if needed |

### `view`, `reshape`, and `clone`

`view()` reshapes a tensor only when the existing memory layout allows it.

```python id="yqxik3"
x = torch.arange(12)
y = x.view(3, 4)

print(y)
```

`reshape()` is more flexible. It returns a view when possible and a copy when necessary.

```python id="nbdtlp"
x = torch.arange(12)
y = x.reshape(3, 4)
```

`clone()` explicitly creates a copy.

```python id="2sq5f6"
x = torch.arange(5)
y = x.clone()

y[0] = -1

print(x)
print(y)
```

Use `clone()` when you need independent storage.

### Contiguity After Permutation

A permuted tensor often becomes noncontiguous.

```python id="et0ghu"
X = torch.randn(2, 3, 4)

Y = X.permute(0, 2, 1)

print(Y.shape)
print(Y.is_contiguous())
```

Output:

```python id="oalaz9"
torch.Size([2, 4, 3])
False
```

The tensor `Y` has a valid shape, but its memory order differs from a standard contiguous layout.

Some operations require contiguous memory. Use:

```python id="j4sydl"
Yc = Y.contiguous()
```

This creates a contiguous copy.

A common pattern:

```python id="llwdv9"
Y = X.permute(0, 2, 1).contiguous()
Y = Y.view(2, 12)
```

Without `contiguous()`, `view()` may fail.

### Indexing in Sequence Models

Suppose a language model returns logits with shape:

$$
[B, T, V],
$$

where \(B\) is batch size, \(T\) is sequence length, and \(V\) is vocabulary size.

```python id="9ciufj"
B, T, V = 4, 8, 10000
logits = torch.randn(B, T, V)
targets = torch.randint(0, V, (B, T))
```

To select the logit assigned to each target token:

```python id="bhf3zj"
target_logits = logits.gather(dim=2, index=targets.unsqueeze(-1))

print(target_logits.shape)  # torch.Size([4, 8, 1])
```

After removing the final singleton dimension:

```python id="mmrt8i"
target_logits = target_logits.squeeze(-1)

print(target_logits.shape)  # torch.Size([4, 8])
```

This is the same indexing principle used in classification, applied at every token position.

### Indexing in Attention Masks

Attention masks often have shape:

$$
[B, T].
$$

A transformer attention layer may need shape:

$$
[B, 1, 1, T]
$$

so the mask can broadcast across attention heads and query positions.

```python id="zv1cgl"
B, T = 4, 8

padding_mask = torch.randint(0, 2, (B, T)).bool()

attention_mask = padding_mask[:, None, None, :]

print(attention_mask.shape)  # torch.Size([4, 1, 1, 8])
```

The `None` entries insert singleton axes. This is equivalent to `unsqueeze`.

```python id="0d10hp"
attention_mask = padding_mask.unsqueeze(1).unsqueeze(2)
```

Indexing is therefore part of shape engineering in transformer implementations.

### Common Mistakes

The most common indexing mistakes are:

| Mistake | Example | Problem |
|---|---|---|
| Removing a needed batch axis | `x[0]` | Layer may expect `[B, D]`, but receives `[D]` |
| Using paired indexing accidentally | `X[rows, cols]` | Selects pairs, not submatrix |
| Confusing view and copy | `y = x[idx]` | Mutation may not affect original |
| Forgetting `.contiguous()` | `x.permute(...).view(...)` | Memory layout may be invalid |
| Mask shape mismatch | `[B, T]` mask for `[B, H, T, T]` scores | Needs singleton axes for broadcasting |

Good tensor code makes shape changes explicit.

### Summary

Indexing selects individual entries. Slicing selects ranges. Boolean masks select entries based on conditions. Integer index tensors allow advanced selection. `gather` and `scatter` provide vectorized indexed reading and writing.

Basic slicing usually returns a view that shares storage with the original tensor. Advanced indexing usually returns a copy. `reshape`, `view`, `permute`, `contiguous`, and `clone` control how tensor data is interpreted or copied.

Correct indexing is central to PyTorch programming because neural networks often need precise selection across batch, channel, sequence, feature, and vocabulary axes.