Deep Learning with PyTorch

Part I. PyTorch Foundations

Chapter 1. Introduction to Deep Learning and PyTorch

1.1 What Is Deep Learning
1.2 The PyTorch Ecosystem
1.3 Dynamic Computation Graphs
1.4 Tensor-Based Computation
1.5 GPUs and Accelerators
1.6 PyTorch Versus Other Frameworks
1.7 Installing and Configuring PyTorch
1.8 Structure of a PyTorch Project

Chapter 2. Tensors and Tensor Operations

2.1 Creating Tensors
2.2 Tensor Shapes and Dimensions
2.3 Tensor Arithmetic
2.4 Broadcasting Rules
2.5 Indexing and Slicing
2.6 Tensor Reshaping
2.7 Matrix Operations
2.8 Random Tensor Generation
2.9 Tensor Memory Layout
2.10 CPU and GPU Tensors

Chapter 3. Automatic Differentiation

3.1 Computational Graphs
3.2 Gradient Computation
3.3 Reverse-Mode Differentiation
3.4 The requires_grad Mechanism
3.5 Backpropagation with backward()
3.6 Gradient Accumulation
3.7 Disabling Gradient Tracking
3.8 Custom Autograd Functions
3.9 Higher-Order Derivatives

Chapter 4. PyTorch Modules and Model Structure

4.1 The nn.Module Interface
4.2 Parameters and Buffers
4.3 Forward Methods
4.4 Sequential Models
4.5 Custom Layers
4.6 Parameter Initialization
4.7 Saving and Loading Models
4.8 Organizing Large Projects

Chapter 5. Data Loading and Preprocessing

5.1 Datasets and DataLoaders
5.2 Batch Processing
5.3 Data Shuffling
5.4 Parallel Data Loading
5.5 Transform Pipelines
5.6 Tokenization and Text Processing
5.7 Image Augmentation
5.8 Streaming and Large Datasets
5.9 Custom Dataset Classes

Part II. Neural Network Fundamentals

Chapter 6. Linear Models and Optimization

6.1 Linear Regression
6.2 Logistic Regression
6.3 Loss Functions
6.4 Gradient Descent
6.5 Stochastic Gradient Descent
6.6 Momentum and Adaptive Methods
6.7 Learning Rate Scheduling
6.8 Weight Decay and Regularization

Chapter 7. Multilayer Neural Networks

7.1 Feedforward Networks
7.2 Hidden Layers
7.3 Activation Functions
7.4 Universal Approximation
7.5 Deep Representations
7.6 Batch Normalization
7.7 Residual Connections

Chapter 8. Training Neural Networks

8.1 Training Loops
8.2 Validation and Testing
8.3 Metrics and Evaluation
8.4 Overfitting and Underfitting
8.5 Early Stopping
8.6 Dropout
8.7 Gradient Clipping
8.8 Mixed Precision Training

Chapter 9. Experiment Management

9.1 Configuration Systems
9.2 Logging and Visualization
9.3 TensorBoard Integration
9.4 Reproducibility
9.5 Checkpointing
9.6 Hyperparameter Search
9.7 Benchmarking and Profiling

Part III. Computer Vision with PyTorch

Chapter 10. Convolutional Neural Networks

10.1 Convolution Operations
10.2 Pooling Layers
10.3 Feature Maps
10.4 Padding and Stride
10.5 CNN Architectures
10.6 Residual Networks
10.7 Efficient Convolutions

Chapter 11. Image Classification

11.1 Classification Pipelines
11.2 Transfer Learning
11.3 Fine-Tuning Pretrained Models
11.4 Data Augmentation Strategies
11.5 Large-Scale Training
11.6 Calibration and Confidence

Chapter 12. Object Detection and Segmentation

12.1 Bounding Box Prediction
12.2 Region Proposal Methods
12.3 YOLO Architectures
12.4 Semantic Segmentation
12.5 Instance Segmentation
12.6 Vision Foundation Models

Chapter 13. Vision Transformers

13.1 Patch Embeddings
13.2 Self-Attention for Images
13.3 Transformer Encoders
13.4 Hybrid CNN-Transformer Models
13.5 Efficient Vision Transformers
13.6 Multimodal Vision Models

Part IV. Sequence Models and NLP

Chapter 14. Recurrent Neural Networks

14.1 Sequential Data
14.2 Recurrent Computation
14.3 Backpropagation Through Time
14.4 Vanishing Gradients
14.5 LSTM Networks
14.6 GRU Networks
14.7 Sequence Modeling Applications

Chapter 15. Attention and Transformers

15.1 Attention Mechanisms
15.2 Self-Attention
15.3 Multi-Head Attention
15.4 Positional Encoding
15.5 Transformer Encoders
15.6 Transformer Decoders
15.7 Efficient Attention Methods

Chapter 16. Natural Language Processing with PyTorch

16.1 Word Embeddings
16.2 Subword Tokenization
16.3 Text Classification
16.4 Named Entity Recognition
16.5 Machine Translation
16.6 Question Answering
16.7 Conversational Systems

Chapter 17. Large Language Models

17.1 Autoregressive Language Models
17.2 Pretraining Objectives
17.3 Instruction Tuning
17.4 Reinforcement Learning from Human Feedback
17.5 Retrieval-Augmented Generation
17.6 Long-Context Models
17.7 Tool-Using Agents

Part V. Generative Deep Learning

Chapter 18. Autoencoders and Representation Learning

18.1 Dimensionality Reduction
18.2 Sparse Autoencoders
18.3 Denoising Autoencoders
18.4 Variational Autoencoders
18.5 Latent Space Manipulation
18.6 Representation Learning

Chapter 19. Generative Adversarial Networks

19.1 Adversarial Training
19.2 Generator and Discriminator Models
19.3 Conditional GANs
19.4 Style-Based GANs
19.5 GAN Stabilization Techniques
19.6 Evaluation of Generative Models

Chapter 20. Diffusion Models

20.1 Forward Noise Processes
20.2 Reverse Denoising Processes
20.3 Score-Based Models
20.4 U-Net Architectures
20.5 Latent Diffusion
20.6 Text-to-Image Generation
20.7 Video Diffusion Systems

Part VI. Graph and Geometric Learning

Chapter 21. Graph Neural Networks

21.1 Graph Representations
21.2 Message Passing Networks
21.3 Graph Convolutions
21.4 Graph Attention Networks
21.5 Knowledge Graph Embeddings
21.6 PyTorch Geometric

Chapter 22. Geometric Deep Learning

22.1 Symmetry and Equivariance
22.2 Point Cloud Networks
22.3 Neural Fields
22.4 Implicit Representations
22.5 Geometric Transformers

Part VII. Reinforcement Learning

Chapter 23. Foundations of Reinforcement Learning

23.1 Agents and Environments
23.2 Markov Decision Processes
23.3 Value Functions
23.4 Policy Optimization
23.5 Exploration Strategies

Chapter 24. Deep Reinforcement Learning with PyTorch

24.1 Deep Q-Networks
24.2 Policy Gradient Methods
24.3 Actor-Critic Systems
24.4 Model-Based Reinforcement Learning
24.5 Offline Reinforcement Learning
24.6 RL for Language Models

Part VIII. Scaling and Systems

Chapter 25. Efficient Training Systems

25.1 GPU Optimization
25.2 Memory Management
25.3 Gradient Checkpointing
25.4 Quantization
25.5 Distillation
25.6 Low-Rank Adaptation

Chapter 26. Distributed Training

26.1 Data Parallelism
26.2 Distributed Data Parallel
26.3 Model Parallelism
26.4 Pipeline Parallelism
26.5 Fault Tolerance
26.6 Multi-Node Training

Chapter 27. PyTorch Compilation and Performance

27.1 TorchScript
27.2 torch.compile
27.3 Graph Optimization
27.4 Kernel Fusion
27.5 CUDA Extensions
27.6 Profiling Bottlenecks

Chapter 28. Deployment and Inference

28.1 Model Serialization
28.2 ONNX Export
28.3 TorchServe
28.4 Mobile Deployment
28.5 Edge Inference
28.6 High-Throughput Serving
28.7 Real-Time Systems

Part IX. Advanced Topics

Chapter 29. Probabilistic Deep Learning

29.1 Bayesian Neural Networks
29.2 Variational Inference
29.3 Monte Carlo Methods
29.4 Uncertainty Estimation
29.5 Gaussian Processes

Chapter 30. Robustness and Interpretability

30.1 Adversarial Examples
30.2 Distribution Shift
30.3 Saliency Maps
30.4 Attribution Methods
30.5 Mechanistic Interpretability
30.6 Model Editing

Chapter 31. Multimodal and Foundation Models

31.1 Vision-Language Models
31.2 Audio-Visual Learning
31.3 Unified Foundation Models
31.4 Retrieval Systems
31.5 Long-Horizon Agents

Chapter 32. Future Directions

32.1 Scaling Laws
32.2 Efficient AI Systems
32.3 Scientific Deep Learning
32.4 Robotics and Embodied AI
32.5 Open Research Problems

Computational GraphsA computational graph is a graph that represents a numerical computation. The nodes represent values or operations. The edges describe how data flows from one operation to the next.

11 min

Datasets and DataLoadersA deep learning model does not train directly from files. It trains from tensors. The purpose of a data pipeline is to convert stored data into batches of tensors with consistent shapes, data types, and labels.

10 min

Linear RegressionLinear regression is the simplest supervised learning model used in deep learning.

8 min

Mean Squared ErrorMean squared error is one of the simplest and most widely used loss functions in supervised learning. It measures the average squared difference between a model’s prediction and the target value.

10 min

Scalars, Vectors, Matrices, and TensorsDeep learning represents data and computation using arrays of numbers.

10 min

Sigmoid and Hyperbolic TangentActivation functions give neural networks their nonlinear structure.

8 min

Supervised LearningSupervised learning is the central paradigm of modern machine learning and deep learning.