Random tensors are used throughout deep learning. They initialize parameters, shuffle examples, sample noise, apply dropout, augment data, and generate outputs from probabilistic models.
Random tensors are used throughout deep learning. They initialize parameters, shuffle examples, sample noise, apply dropout, augment data, and generate outputs from probabilistic models. PyTorch provides direct tools for drawing samples from common probability distributions.
A random tensor has the same structural properties as any other tensor: shape, dtype, device, layout, and gradient behavior. The difference is that its values are produced by a pseudorandom number generator.
Uniform Random Tensors
torch.rand creates a tensor whose values are sampled uniformly from the interval .
import torch
x = torch.rand(3, 4)
print(x)
print(x.shape)Each entry is sampled independently:
Uniform random values are useful for simulation, randomized tests, dropout masks, and some parameter initializers.
To sample from a different interval , scale and shift the tensor:
a = -1.0
b = 1.0
x = a + (b - a) * torch.rand(3, 4)
print(x)This samples from .
Normal Random Tensors
torch.randn creates a tensor whose values are sampled from the standard normal distribution.
x = torch.randn(3, 4)
print(x)Each entry is sampled independently:
To sample from a normal distribution with mean and standard deviation :
mu = 10.0
sigma = 2.0
x = mu + sigma * torch.randn(3, 4)
print(x)This gives:
Normal random tensors are common in weight initialization, latent variable models, diffusion models, and variational autoencoders.
Random Integers
torch.randint samples integer values.
x = torch.randint(low=0, high=10, size=(3, 4))
print(x)The lower bound is included. The upper bound is excluded.
Random integers are used for labels, token IDs, synthetic data, random crops, and sampling indices.
Example:
batch_size = 8
num_classes = 10
labels = torch.randint(0, num_classes, (batch_size,))
print(labels)Random Permutations
torch.randperm(n) returns a random permutation of integers from to .
idx = torch.randperm(10)
print(idx)This is often used to shuffle datasets.
X = torch.randn(10, 3)
y = torch.arange(10)
idx = torch.randperm(10)
X = X[idx]
y = y[idx]The same index tensor is applied to both X and y, so the examples and labels remain aligned.
Sampling from Existing Tensors
Sometimes we need to sample rows, tokens, or examples from an existing tensor.
X = torch.randn(100, 32)
idx = torch.randint(0, X.shape[0], (16,))
batch = X[idx]
print(batch.shape) # torch.Size([16, 32])This samples 16 rows with replacement.
To sample without replacement:
idx = torch.randperm(X.shape[0])[:16]
batch = X[idx]This pattern is useful for writing simple minibatch training loops.
Bernoulli Random Variables
A Bernoulli random variable takes value 1 with probability and 0 with probability .
p = torch.full((10,), 0.3)
samples = torch.bernoulli(p)
print(samples)Each entry follows:
Bernoulli samples are used to create binary masks.
For example, dropout can be expressed using a Bernoulli mask:
x = torch.randn(8)
keep_prob = 0.8
mask = torch.bernoulli(torch.full_like(x, keep_prob))
y = x * mask / keep_probThe division by keep_prob preserves the expected activation scale during training.
Multinomial Sampling
torch.multinomial samples indices according to probabilities.
probs = torch.tensor([0.1, 0.2, 0.7])
sample = torch.multinomial(probs, num_samples=1)
print(sample)The index 2 is most likely because it has probability 0.7.
Sampling multiple values:
samples = torch.multinomial(probs, num_samples=5, replacement=True)
print(samples)This is common in language generation. A model produces logits over a vocabulary. The logits are converted into probabilities, and the next token is sampled.
logits = torch.randn(50_000)
probs = torch.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)Random Noise for Generative Models
Generative models often start from random noise.
For a variational autoencoder, we may sample a latent vector:
batch_size = 32
latent_dim = 128
z = torch.randn(batch_size, latent_dim)For an image diffusion model, we may sample Gaussian noise with image shape:
noise = torch.randn(32, 3, 64, 64)For a GAN, the generator may receive:
z = torch.randn(32, 100)The shape of the noise tensor is part of the model design. It controls how many samples are generated and how much latent variation the model can express.
Randomness and Devices
Random tensors can be created directly on a device.
device = "cuda" if torch.cuda.is_available() else "cpu"
x = torch.randn(32, 128, device=device)This avoids creating a CPU tensor and then copying it to the GPU.
When using existing tensors, prefer randn_like, rand_like, or empty_like:
x = torch.randn(32, 128, device=device)
noise = torch.randn_like(x)
mask = torch.rand_like(x) > 0.1These functions preserve shape, dtype, and device.
Randomness and Data Types
Random floating-point tensors default to torch.float32.
x = torch.randn(3, 4)
print(x.dtype) # torch.float32A dtype can be specified:
x = torch.randn(3, 4, dtype=torch.float16)For integer random tensors:
ids = torch.randint(0, 1000, (8,), dtype=torch.long)Token IDs and class labels are usually torch.long.
Pseudorandom Number Generators
Computers usually generate pseudorandom numbers, not truly random numbers. A pseudorandom number generator produces a deterministic sequence from an initial seed.
Set the PyTorch seed with:
torch.manual_seed(1234)Example:
torch.manual_seed(1234)
a = torch.randn(3)
torch.manual_seed(1234)
b = torch.randn(3)
print(a)
print(b)
print(torch.equal(a, b))The tensors a and b are identical because the generator was reset to the same state.
Reproducibility on GPUs
GPU computation can introduce nondeterminism. Some operations have multiple valid execution orders, and floating-point addition can give slightly different results depending on order.
For stronger reproducibility:
torch.manual_seed(1234)
torch.cuda.manual_seed_all(1234)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = FalsePyTorch also provides:
torch.use_deterministic_algorithms(True)This asks PyTorch to use deterministic algorithms where available. Some operations may become slower or raise errors if no deterministic implementation exists.
Exact reproducibility across hardware, driver versions, and PyTorch versions can still be difficult. For experiments, record software versions, hardware type, seeds, and important configuration flags.
Local Generators
A local generator gives more precise control over randomness.
gen = torch.Generator()
gen.manual_seed(42)
x = torch.randn(3, generator=gen)
y = torch.randn(3, generator=gen)The generator keeps its own state. This is useful when one part of a program needs reproducible randomness without changing the global random sequence.
A local generator can also be passed to data splitting or sampling code when supported.
Randomness in Data Loading
Randomness appears in data pipelines through shuffling, augmentation, and worker processes.
A DataLoader may shuffle examples:
from torch.utils.data import DataLoader, TensorDataset
X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=16, shuffle=True)With multiple workers, each worker may need a controlled seed if exact reproducibility is required.
def seed_worker(worker_id):
worker_seed = torch.initial_seed() % 2**32
import random
import numpy as np
random.seed(worker_seed)
np.random.seed(worker_seed)Then:
gen = torch.Generator()
gen.manual_seed(42)
loader = DataLoader(
dataset,
batch_size=16,
shuffle=True,
num_workers=2,
worker_init_fn=seed_worker,
generator=gen,
)This pattern helps align PyTorch, Python, and NumPy randomness inside workers.
Randomness in Model Training
Several training mechanisms depend on randomness:
| Mechanism | Random quantity |
|---|---|
| Weight initialization | Initial parameters |
| Minibatch training | Example order |
| Dropout | Binary masks |
| Data augmentation | Crops, flips, noise, color changes |
| Negative sampling | Sampled negatives |
| Generative modeling | Latent noise |
| Language generation | Sampled tokens |
Randomness is therefore both a modeling tool and a source of variation. Two training runs with different seeds may produce different validation scores, especially for small datasets or unstable objectives.
For reliable evaluation, report the mean and variance across multiple seeds when possible.
Random Initialization Example
The following example creates parameters for a small neural network manually.
import math
import torch
din = 128
hidden = 256
dout = 10
W1 = torch.randn(din, hidden) * math.sqrt(2.0 / din)
b1 = torch.zeros(hidden)
W2 = torch.randn(hidden, dout) * math.sqrt(2.0 / hidden)
b2 = torch.zeros(dout)The scaling factors follow the same basic idea as Kaiming initialization. They keep activation magnitudes from growing or shrinking too quickly in ReLU networks.
A forward pass:
X = torch.randn(32, din)
H = torch.relu(X @ W1 + b1)
logits = H @ W2 + b2
print(logits.shape) # torch.Size([32, 10])This code shows how random tensor generation connects directly to model initialization.
Random Sampling Example for Language Models
A language model returns logits over a vocabulary.
V = 50_000
logits = torch.randn(V)Convert logits to probabilities:
probs = torch.softmax(logits, dim=-1)Sample one token:
token = torch.multinomial(probs, num_samples=1)
print(token)A temperature parameter controls randomness:
temperature = 0.8
probs = torch.softmax(logits / temperature, dim=-1)
token = torch.multinomial(probs, num_samples=1)Lower temperature sharpens the distribution. Higher temperature flattens it.
Summary
Random tensor generation is used for initialization, sampling, noise injection, data loading, augmentation, and generative modeling. PyTorch provides constructors for uniform, normal, integer, Bernoulli, and multinomial random variables.
Randomness is controlled by pseudorandom number generators and seeds. Reproducibility requires careful handling of global seeds, local generators, GPU determinism, and data loading workers.
In deep learning, randomness has two roles: it provides useful stochasticity during learning, and it introduces experimental variation that must be measured and controlled.