Random search is a hyperparameter optimization method that samples configurations at random from a search space.
Random search is a hyperparameter optimization method that samples configurations at random from a search space. Instead of evaluating every point on a fixed grid, random search chooses a fixed number of trials and draws each trial independently.
This method is simple, but it is often more effective than grid search in deep learning. The reason is that only a small number of hyperparameters usually dominate performance. Random search spends more trials exploring different values of important dimensions instead of repeatedly evaluating unimportant combinations.
The Basic Idea
Suppose we want to tune learning rate, batch size, hidden dimension, dropout, and optimizer. Grid search would require choosing a small finite set for each one and evaluating the full Cartesian product.
Random search instead defines distributions:
Each trial samples one configuration from these distributions.
For example:
config = {
"learning_rate": 2.7e-4,
"batch_size": 128,
"hidden_dim": 512,
"dropout": 0.18,
"optimizer": "AdamW",
}After training and validation, the result is recorded. The best configuration found so far is retained.
Why Random Search Helps
Grid search allocates equal attention to every search dimension. This is inefficient when some dimensions matter much more than others.
Assume validation performance depends strongly on learning rate but weakly on dropout. A grid with five learning rates and five dropout values evaluates only five distinct learning rates:
runs, but the learning rate takes only five possible values.
Random search with 25 trials can evaluate 25 different learning rates. This gives much better coverage of the important dimension.
The advantage becomes larger as the number of weak or irrelevant dimensions increases.
Random Search Versus Grid Search
Consider two hyperparameters:
| Method | Learning rate values explored | Dropout values explored | Total trials |
|---|---|---|---|
| Grid search | 5 fixed values | 5 fixed values | 25 |
| Random search | 25 sampled values | 25 sampled values | 25 |
With the same number of trials, random search explores more distinct values per dimension.
This matters especially for continuous hyperparameters. A grid imposes artificial resolution. Random sampling avoids this fixed lattice.
Defining Sampling Distributions
Random search requires distributions, not just candidate sets.
Common choices are:
| Hyperparameter | Suggested distribution |
|---|---|
| Learning rate | Log-uniform |
| Weight decay | Log-uniform |
| Dropout | Uniform |
| Batch size | Categorical |
| Hidden dimension | Categorical |
| Number of layers | Categorical |
| Warmup ratio | Uniform |
| Gradient clipping norm | Log-uniform |
| Label smoothing | Uniform |
A log-uniform distribution is useful when the right scale is unknown. For example, the useful learning rate may be , , or . Sampling uniformly in ordinary space would over-sample large values.
A log-uniform draw can be written as:
If base 10 is used:
In Python:
import random
import math
def log_uniform(low, high):
return math.exp(random.uniform(math.log(low), math.log(high)))
learning_rate = log_uniform(1e-5, 1e-1)A Minimal Random Search Implementation
A search space can be represented as a dictionary:
search_space = {
"learning_rate": ("log_uniform", 1e-5, 1e-1),
"weight_decay": ("log_uniform", 1e-6, 1e-1),
"batch_size": ("choice", [32, 64, 128, 256]),
"hidden_dim": ("choice", [128, 256, 512, 1024]),
"dropout": ("uniform", 0.0, 0.5),
"optimizer": ("choice", ["SGD", "Adam", "AdamW"]),
}We can implement a sampler:
import random
import math
def sample_value(spec):
kind = spec[0]
if kind == "choice":
return random.choice(spec[1])
if kind == "uniform":
low, high = spec[1], spec[2]
return random.uniform(low, high)
if kind == "log_uniform":
low, high = spec[1], spec[2]
return math.exp(random.uniform(math.log(low), math.log(high)))
raise ValueError(f"unknown search distribution: {kind}")
def sample_config(search_space):
return {
name: sample_value(spec)
for name, spec in search_space.items()
}Then random search becomes:
best_config = None
best_score = float("-inf")
num_trials = 50
for trial in range(num_trials):
config = sample_config(search_space)
score = train_and_evaluate(config)
if score > best_score:
best_score = score
best_config = config
print("best score:", best_score)
print("best config:", best_config)The function train_and_evaluate should construct the model, optimizer, scheduler, data loaders, and training loop from the configuration.
Conditional Random Search
Some hyperparameters only make sense under certain choices.
For example, momentum matters for SGD:
if optimizer == "SGD":
momentum = sample_uniform(0.0, 0.99)Adam beta values matter for Adam and AdamW:
if optimizer in {"Adam", "AdamW"}:
beta1 = sample_uniform(0.8, 0.99)
beta2 = sample_uniform(0.9, 0.9999)A conditional sampler can encode this directly:
def sample_optimizer_config():
optimizer = random.choice(["SGD", "Adam", "AdamW"])
config = {"optimizer": optimizer}
if optimizer == "SGD":
config["momentum"] = random.uniform(0.0, 0.99)
if optimizer in {"Adam", "AdamW"}:
config["beta1"] = random.uniform(0.8, 0.99)
config["beta2"] = log_uniform(0.9, 0.9999)
return configConditional search spaces avoid meaningless parameters. This improves search efficiency and makes the result easier to interpret.
Number of Trials
Random search requires choosing a trial budget. The budget depends on training cost, available hardware, and search-space size.
For small models, hundreds of trials may be feasible. For large models, even ten trials may be expensive.
A practical pattern is:
| Training cost per run | Reasonable initial trials |
|---|---|
| Seconds | 100 to 1000 |
| Minutes | 50 to 200 |
| Hours | 10 to 50 |
| Days | 3 to 10 |
The search should begin with wide ranges and a modest budget. After promising regions are found, a second random search can focus on narrower ranges.
Coarse-to-Fine Random Search
Random search is often used in stages.
First, run a broad search:
broad_space = {
"learning_rate": ("log_uniform", 1e-5, 1e-1),
"weight_decay": ("log_uniform", 1e-6, 1e-1),
"dropout": ("uniform", 0.0, 0.5),
}Suppose the best results cluster near:
Then define a narrower search:
narrow_space = {
"learning_rate": ("log_uniform", 1e-4, 1e-3),
"weight_decay": ("log_uniform", 1e-3, 1e-2),
"dropout": ("uniform", 0.05, 0.2),
}This second stage increases resolution where useful configurations are likely.
Random Seeds and Reproducibility
Random search introduces randomness in two places.
First, the search algorithm samples configurations randomly. Second, model training itself is stochastic due to initialization, data shuffling, dropout, nondeterministic GPU kernels, and augmentation.
To improve reproducibility, save:
| Item | Purpose |
|---|---|
| Search seed | Reproduce sampled configurations |
| Training seed | Reproduce model initialization and data order |
| Full configuration | Rebuild the experiment |
| Validation metrics | Compare trials |
| Checkpoint path | Reload selected model |
| Code version | Match implementation |
Example:
import random
import torch
def set_seed(seed):
random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)Each trial can receive its own seed:
base_seed = 1234
for trial in range(num_trials):
trial_seed = base_seed + trial
set_seed(trial_seed)
config = sample_config(search_space)
config["seed"] = trial_seed
score = train_and_evaluate(config)This makes the search easier to audit.
Handling Failed Trials
Random search may sample unstable or invalid configurations. A learning rate may be too large. A batch size may exceed memory. A transformer hidden size may be incompatible with the number of heads.
Failed trials should be recorded, not silently ignored.
results = []
for trial in range(num_trials):
config = sample_config(search_space)
try:
score = train_and_evaluate(config)
status = "ok"
except RuntimeError as e:
score = None
status = "failed"
error = str(e)
results.append({
"trial": trial,
"config": config,
"score": score,
"status": status,
})This record helps identify bad regions of the search space. If many trials fail, the search space should be constrained.
Comparing Configurations Fairly
A configuration should be compared under the same evaluation protocol.
This means:
| Factor | Should be fixed across trials |
|---|---|
| Training data | Same split |
| Validation data | Same split |
| Number of epochs | Same budget unless using early stopping |
| Evaluation metric | Same metric |
| Preprocessing | Same rules unless intentionally searched |
| Random seed policy | Same procedure |
| Hardware assumptions | Same precision and device type |
If one configuration receives more training steps than another, its score may reflect extra compute rather than better hyperparameters.
For budget-aware optimization, use an explicit objective such as validation accuracy after a fixed number of steps, or validation loss under a fixed GPU-hour budget.
Search Results as Data
Random search produces useful diagnostic data. Even failed or mediocre trials can show which hyperparameters matter.
After running trials, we can sort by validation score:
results = sorted(results, key=lambda r: r["score"], reverse=True)We can inspect the best configurations:
for r in results[:5]:
print(r["score"], r["config"])Patterns are often more valuable than a single best configuration. For example, the top configurations may all use AdamW, learning rates near , dropout below 0.2, and moderate weight decay. This suggests a stable region of the search space.
Advantages and Disadvantages
| Advantages | Disadvantages |
|---|---|
| Simple to implement | No guarantee of finding the optimum |
| Works well in high-dimensional spaces | Results vary across random seeds |
| Efficient for continuous variables | Can waste trials in bad regions |
| Easy to parallelize | Does not learn from previous trials |
| Better than grid search for many DL tasks | Needs careful distribution design |
Random search is a strong default when the search space is moderately large and the cost per run is acceptable.
When to Use Random Search
Random search is a good choice when:
| Situation | Reason |
|---|---|
| Several hyperparameters matter | Better coverage than grid search |
| Some dimensions are continuous | Avoids fixed grid resolution |
| Search budget is limited | Can stop after any number of trials |
| Parallel workers are available | Trials are independent |
| Baselines are needed quickly | Simple and robust |
Random search is less suitable when each run is extremely expensive and only a few trials are possible. In that case, expert tuning or Bayesian optimization may use the budget more effectively.
Summary
Random search samples hyperparameter configurations from predefined distributions. It replaces exhaustive enumeration with stochastic exploration.
Compared with grid search, random search often covers important dimensions more effectively under the same trial budget. It works especially well when only a few hyperparameters strongly influence performance.
A good random search depends on well-designed sampling distributions, proper logging, valid constraints, reproducible seeds, and fair evaluation. It is simple enough to be a baseline and strong enough to be useful in real deep learning systems.