Unsupervised learning studies data without explicit target labels. The dataset contains inputs only:
Unsupervised learning studies data without explicit target labels. The dataset contains inputs only:
There is no given . The model must discover useful structure in the data itself.
Unsupervised learning is used for clustering, dimensionality reduction, density estimation, anomaly detection, representation learning, and generative modeling.
The Goal of Unsupervised Learning
In supervised learning, the target tells the model what to predict. In unsupervised learning, the target is implicit.
The model may try to learn:
| Goal | Meaning |
|---|---|
| Clusters | Which examples are similar |
| Representations | Useful hidden features |
| Low-dimensional structure | A compact version of the data |
| Probability density | How likely an example is |
| Generative structure | How to produce new examples |
| Anomalies | Which examples are unusual |
For example, given many images without labels, an unsupervised method may learn that images contain edges, textures, shapes, parts, and object-like regions. No human label says “edge” or “object.” The structure comes from the data distribution.
Clustering
Clustering groups similar examples together.
Given data points
a clustering algorithm assigns each point to a cluster:
The classic example is -means clustering. It learns cluster centers:
Each data point is assigned to the nearest center:
The objective is
In deep learning, clustering is often applied to learned embeddings rather than raw data. For example, a neural network may first map an image to an embedding vector, and clustering may then group embeddings by visual similarity.
Dimensionality Reduction
Dimensionality reduction maps high-dimensional data into a lower-dimensional space.
Suppose
and is large. We want a lower-dimensional representation
The mapping is usually written as
The goal is to preserve important structure while removing redundancy or noise.
Principal component analysis, or PCA, is the classical linear method. Autoencoders are the deep learning version.
An autoencoder has two parts:
and
The encoder maps the input to a compact representation. The decoder reconstructs the input from that representation.
The reconstruction loss is often
In PyTorch:
import torch
import torch.nn as nn
class Autoencoder(nn.Module):
def __init__(self, input_dim, hidden_dim):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, 256),
nn.ReLU(),
nn.Linear(256, hidden_dim),
)
self.decoder = nn.Sequential(
nn.Linear(hidden_dim, 256),
nn.ReLU(),
nn.Linear(256, input_dim),
)
def forward(self, x):
z = self.encoder(x)
x_hat = self.decoder(z)
return x_hat, z
model = Autoencoder(input_dim=784, hidden_dim=32)
x = torch.randn(64, 784)
x_hat, z = model(x)
loss = ((x - x_hat) ** 2).mean()
print(z.shape) # torch.Size([64, 32])
print(loss.shape) # torch.Size([])Here the model compresses 784-dimensional inputs into 32-dimensional representations.
Representation Learning
Representation learning is the process of learning useful features from data.
A representation is a transformed version of an input:
The representation should keep information that matters and discard information that does not.
For example:
| Input | Useful representation may capture |
|---|---|
| Image | Shapes, textures, object parts |
| Sentence | Meaning, syntax, entities |
| Audio | Phonemes, speaker traits, rhythm |
| Graph | Node roles, communities, connectivity |
| User behavior | Preferences, intent, habits |
Deep learning is powerful because it can learn representations instead of relying only on manually designed features.
In modern systems, representations are often learned on large unlabeled datasets and reused for supervised tasks. A model trained on unlabeled text may learn embeddings useful for classification, retrieval, question answering, and summarization.
Density Estimation
Density estimation tries to learn the probability distribution that generated the data.
The goal is to model
If the model assigns high probability to realistic examples and low probability to unrealistic examples, it has learned something about the data distribution.
Density estimation is central to generative modeling. A language model, for example, estimates the probability of a token sequence:
Using the chain rule of probability, this can be written as
This objective uses no external human label. The sequence itself provides the training signal.
Generative Modeling
A generative model learns to produce new data that resembles the training data.
Examples include:
| Data type | Generated output |
|---|---|
| Text | Articles, code, answers |
| Images | Photorealistic pictures |
| Audio | Speech or music |
| Video | Motion sequences |
| Molecules | Candidate chemical structures |
A generative model may learn either an explicit probability distribution or an implicit sampling process.
Important families include:
| Model family | Core idea |
|---|---|
| Autoregressive models | Generate one element at a time |
| Variational autoencoders | Learn latent variables |
| Generative adversarial networks | Train generator against discriminator |
| Normalizing flows | Learn invertible transformations |
| Diffusion models | Learn to reverse a noise process |
Generative modeling is often unsupervised because training examples do not require external labels. The model learns from the structure of the data itself.
Anomaly Detection
Anomaly detection identifies examples that differ from normal data.
A model is trained on ordinary examples. At inference time, unusual examples receive high anomaly scores.
For an autoencoder, one simple anomaly score is reconstruction error:
If the model reconstructs normal examples well but reconstructs unusual examples poorly, then high reconstruction error indicates an anomaly.
Applications include:
| Domain | Anomaly |
|---|---|
| Cybersecurity | Suspicious network behavior |
| Manufacturing | Defective parts |
| Finance | Fraudulent transactions |
| Medicine | Unusual scans |
| Infrastructure | Sensor failures |
Anomaly detection is difficult because anomalies are rare and diverse. The model may see many examples of normal behavior but few examples of failure.
Unsupervised Learning in PyTorch
A basic unsupervised training loop looks similar to supervised training. The difference is that there may be no external label.
For an autoencoder:
for x_batch in dataloader:
optimizer.zero_grad()
x_hat, z = model(x_batch)
loss = ((x_batch - x_hat) ** 2).mean()
loss.backward()
optimizer.step()The input itself acts as the target. This pattern appears in many unsupervised models.
For contrastive or self-supervised methods, the training loop may create artificial views of the same input:
for x_batch in dataloader:
x1 = augment(x_batch)
x2 = augment(x_batch)
z1 = encoder(x1)
z2 = encoder(x2)
loss = contrastive_loss(z1, z2)
optimizer.zero_grad()
loss.backward()
optimizer.step()The dataset still contains only , but the training procedure constructs a learning signal from transformations of .
Unsupervised Versus Supervised Learning
The practical difference is the source of the training signal.
| Property | Supervised learning | Unsupervised learning |
|---|---|---|
| Data | Inputs and targets | Inputs only |
| Example | ||
| Objective | Predict labels or values | Discover structure |
| Common tasks | Classification, regression | Clustering, compression, generation |
| Cost | Often needs labels | Can use unlabeled data |
| Risk | Label bias, overfitting | Harder evaluation |
Unsupervised learning can use much larger datasets because unlabeled data is abundant. This makes it important for modern deep learning, where large-scale pretraining often depends on weak, implicit, or self-generated training signals.
Limitations
Unsupervised learning has several limitations.
First, the objective may not match the final task. A model may learn structure that is mathematically valid but practically useless.
Second, evaluation is harder. In classification, accuracy is easy to measure. In unsupervised learning, there may be no single correct answer.
Third, learned representations may encode unwanted biases from the data.
Fourth, generative models may learn to imitate surface statistics without learning deeper causal structure.
Finally, unsupervised methods often require large datasets and careful objective design.
Summary
Unsupervised learning learns from inputs without explicit labels. It aims to discover structure in the data distribution.
The main tasks include clustering, dimensionality reduction, representation learning, density estimation, generative modeling, and anomaly detection.
In deep learning, unsupervised learning is important because it can use large unlabeled datasets. Autoencoders, language models, diffusion models, and contrastive systems all rely on the idea that useful training signals can be extracted from the data itself.