# Random Tensor Generation

Random tensors are used throughout deep learning. They initialize parameters, shuffle examples, sample noise, apply dropout, augment data, and generate outputs from probabilistic models. PyTorch provides direct tools for drawing samples from common probability distributions.

A random tensor has the same structural properties as any other tensor: shape, dtype, device, layout, and gradient behavior. The difference is that its values are produced by a pseudorandom number generator.

### Uniform Random Tensors

`torch.rand` creates a tensor whose values are sampled uniformly from the interval \([0, 1)\).

```python
import torch

x = torch.rand(3, 4)

print(x)
print(x.shape)
```

Each entry is sampled independently:

$$
x_{ij} \sim U(0, 1).
$$

Uniform random values are useful for simulation, randomized tests, dropout masks, and some parameter initializers.

To sample from a different interval \([a, b)\), scale and shift the tensor:

```python
a = -1.0
b = 1.0

x = a + (b - a) * torch.rand(3, 4)

print(x)
```

This samples from \(U(-1, 1)\).

### Normal Random Tensors

`torch.randn` creates a tensor whose values are sampled from the standard normal distribution.

```python
x = torch.randn(3, 4)

print(x)
```

Each entry is sampled independently:

$$
x_{ij} \sim \mathcal{N}(0, 1).
$$

To sample from a normal distribution with mean \(\mu\) and standard deviation \(\sigma\):

```python
mu = 10.0
sigma = 2.0

x = mu + sigma * torch.randn(3, 4)

print(x)
```

This gives:

$$
x_{ij} \sim \mathcal{N}(\mu, \sigma^2).
$$

Normal random tensors are common in weight initialization, latent variable models, diffusion models, and variational autoencoders.

### Random Integers

`torch.randint` samples integer values.

```python
x = torch.randint(low=0, high=10, size=(3, 4))

print(x)
```

The lower bound is included. The upper bound is excluded.

$$
x_{ij} \in \{0, 1, \dots, 9\}.
$$

Random integers are used for labels, token IDs, synthetic data, random crops, and sampling indices.

Example:

```python
batch_size = 8
num_classes = 10

labels = torch.randint(0, num_classes, (batch_size,))

print(labels)
```

### Random Permutations

`torch.randperm(n)` returns a random permutation of integers from \(0\) to \(n-1\).

```python
idx = torch.randperm(10)

print(idx)
```

This is often used to shuffle datasets.

```python
X = torch.randn(10, 3)
y = torch.arange(10)

idx = torch.randperm(10)

X = X[idx]
y = y[idx]
```

The same index tensor is applied to both `X` and `y`, so the examples and labels remain aligned.

### Sampling from Existing Tensors

Sometimes we need to sample rows, tokens, or examples from an existing tensor.

```python
X = torch.randn(100, 32)

idx = torch.randint(0, X.shape[0], (16,))
batch = X[idx]

print(batch.shape)  # torch.Size([16, 32])
```

This samples 16 rows with replacement.

To sample without replacement:

```python
idx = torch.randperm(X.shape[0])[:16]
batch = X[idx]
```

This pattern is useful for writing simple minibatch training loops.

### Bernoulli Random Variables

A Bernoulli random variable takes value 1 with probability \(p\) and 0 with probability \(1-p\).

```python
p = torch.full((10,), 0.3)

samples = torch.bernoulli(p)

print(samples)
```

Each entry follows:

$$
x_i \sim \operatorname{Bernoulli}(p_i).
$$

Bernoulli samples are used to create binary masks.

For example, dropout can be expressed using a Bernoulli mask:

```python
x = torch.randn(8)
keep_prob = 0.8

mask = torch.bernoulli(torch.full_like(x, keep_prob))
y = x * mask / keep_prob
```

The division by `keep_prob` preserves the expected activation scale during training.

### Multinomial Sampling

`torch.multinomial` samples indices according to probabilities.

```python
probs = torch.tensor([0.1, 0.2, 0.7])

sample = torch.multinomial(probs, num_samples=1)

print(sample)
```

The index 2 is most likely because it has probability 0.7.

Sampling multiple values:

```python
samples = torch.multinomial(probs, num_samples=5, replacement=True)

print(samples)
```

This is common in language generation. A model produces logits over a vocabulary. The logits are converted into probabilities, and the next token is sampled.

```python
logits = torch.randn(50_000)

probs = torch.softmax(logits, dim=-1)

next_token = torch.multinomial(probs, num_samples=1)
```

### Random Noise for Generative Models

Generative models often start from random noise.

For a variational autoencoder, we may sample a latent vector:

```python
batch_size = 32
latent_dim = 128

z = torch.randn(batch_size, latent_dim)
```

For an image diffusion model, we may sample Gaussian noise with image shape:

```python
noise = torch.randn(32, 3, 64, 64)
```

For a GAN, the generator may receive:

```python
z = torch.randn(32, 100)
```

The shape of the noise tensor is part of the model design. It controls how many samples are generated and how much latent variation the model can express.

### Randomness and Devices

Random tensors can be created directly on a device.

```python
device = "cuda" if torch.cuda.is_available() else "cpu"

x = torch.randn(32, 128, device=device)
```

This avoids creating a CPU tensor and then copying it to the GPU.

When using existing tensors, prefer `randn_like`, `rand_like`, or `empty_like`:

```python
x = torch.randn(32, 128, device=device)

noise = torch.randn_like(x)
mask = torch.rand_like(x) > 0.1
```

These functions preserve shape, dtype, and device.

### Randomness and Data Types

Random floating-point tensors default to `torch.float32`.

```python
x = torch.randn(3, 4)
print(x.dtype)  # torch.float32
```

A dtype can be specified:

```python
x = torch.randn(3, 4, dtype=torch.float16)
```

For integer random tensors:

```python
ids = torch.randint(0, 1000, (8,), dtype=torch.long)
```

Token IDs and class labels are usually `torch.long`.

### Pseudorandom Number Generators

Computers usually generate pseudorandom numbers, not truly random numbers. A pseudorandom number generator produces a deterministic sequence from an initial seed.

Set the PyTorch seed with:

```python
torch.manual_seed(1234)
```

Example:

```python
torch.manual_seed(1234)
a = torch.randn(3)

torch.manual_seed(1234)
b = torch.randn(3)

print(a)
print(b)
print(torch.equal(a, b))
```

The tensors `a` and `b` are identical because the generator was reset to the same state.

### Reproducibility on GPUs

GPU computation can introduce nondeterminism. Some operations have multiple valid execution orders, and floating-point addition can give slightly different results depending on order.

For stronger reproducibility:

```python
torch.manual_seed(1234)
torch.cuda.manual_seed_all(1234)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
```

PyTorch also provides:

```python
torch.use_deterministic_algorithms(True)
```

This asks PyTorch to use deterministic algorithms where available. Some operations may become slower or raise errors if no deterministic implementation exists.

Exact reproducibility across hardware, driver versions, and PyTorch versions can still be difficult. For experiments, record software versions, hardware type, seeds, and important configuration flags.

### Local Generators

A local generator gives more precise control over randomness.

```python
gen = torch.Generator()
gen.manual_seed(42)

x = torch.randn(3, generator=gen)
y = torch.randn(3, generator=gen)
```

The generator keeps its own state. This is useful when one part of a program needs reproducible randomness without changing the global random sequence.

A local generator can also be passed to data splitting or sampling code when supported.

### Randomness in Data Loading

Randomness appears in data pipelines through shuffling, augmentation, and worker processes.

A `DataLoader` may shuffle examples:

```python
from torch.utils.data import DataLoader, TensorDataset

X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))

dataset = TensorDataset(X, y)

loader = DataLoader(dataset, batch_size=16, shuffle=True)
```

With multiple workers, each worker may need a controlled seed if exact reproducibility is required.

```python
def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    import random
    import numpy as np

    random.seed(worker_seed)
    np.random.seed(worker_seed)
```

Then:

```python
gen = torch.Generator()
gen.manual_seed(42)

loader = DataLoader(
    dataset,
    batch_size=16,
    shuffle=True,
    num_workers=2,
    worker_init_fn=seed_worker,
    generator=gen,
)
```

This pattern helps align PyTorch, Python, and NumPy randomness inside workers.

### Randomness in Model Training

Several training mechanisms depend on randomness:

| Mechanism | Random quantity |
|---|---|
| Weight initialization | Initial parameters |
| Minibatch training | Example order |
| Dropout | Binary masks |
| Data augmentation | Crops, flips, noise, color changes |
| Negative sampling | Sampled negatives |
| Generative modeling | Latent noise |
| Language generation | Sampled tokens |

Randomness is therefore both a modeling tool and a source of variation. Two training runs with different seeds may produce different validation scores, especially for small datasets or unstable objectives.

For reliable evaluation, report the mean and variance across multiple seeds when possible.

### Random Initialization Example

The following example creates parameters for a small neural network manually.

```python
import math
import torch

din = 128
hidden = 256
dout = 10

W1 = torch.randn(din, hidden) * math.sqrt(2.0 / din)
b1 = torch.zeros(hidden)

W2 = torch.randn(hidden, dout) * math.sqrt(2.0 / hidden)
b2 = torch.zeros(dout)
```

The scaling factors follow the same basic idea as Kaiming initialization. They keep activation magnitudes from growing or shrinking too quickly in ReLU networks.

A forward pass:

```python
X = torch.randn(32, din)

H = torch.relu(X @ W1 + b1)
logits = H @ W2 + b2

print(logits.shape)  # torch.Size([32, 10])
```

This code shows how random tensor generation connects directly to model initialization.

### Random Sampling Example for Language Models

A language model returns logits over a vocabulary.

```python
V = 50_000

logits = torch.randn(V)
```

Convert logits to probabilities:

```python
probs = torch.softmax(logits, dim=-1)
```

Sample one token:

```python
token = torch.multinomial(probs, num_samples=1)

print(token)
```

A temperature parameter controls randomness:

```python
temperature = 0.8

probs = torch.softmax(logits / temperature, dim=-1)
token = torch.multinomial(probs, num_samples=1)
```

Lower temperature sharpens the distribution. Higher temperature flattens it.

### Summary

Random tensor generation is used for initialization, sampling, noise injection, data loading, augmentation, and generative modeling. PyTorch provides constructors for uniform, normal, integer, Bernoulli, and multinomial random variables.

Randomness is controlled by pseudorandom number generators and seeds. Reproducibility requires careful handling of global seeds, local generators, GPU determinism, and data loading workers.

In deep learning, randomness has two roles: it provides useful stochasticity during learning, and it introduces experimental variation that must be measured and controlled.

