This notebook is truly impressive, and I gained a wealth of knowledge about PyTorch and the entire development cycle from it. Although it contains many repetitive sections designed to reinforce learning, the pipeline is valuable and worth practicing repeatedly. Additionally, I have included some useful code snippets here, so there's no need to go through all the materials again.
GPU/CPU
# Check if the gpu works on your computer.
torch.cuda.is_available() # 'cuda'
torch.cuda.device_count()
torch.backends.mps.is_available() # Note this will print false if you're not running on a Mac
# Setup device agnostic code for future models
if torch.cuda.is_available():
device = "cuda" # Use NVIDIA GPU (if available)
elif torch.backends.mps.is_available():
device = "mps" # Use Apple Silicon GPU (if available)
else:
device = "cpu" # Default to CPU if no GPU is available
# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
Prepare Datasets
Here I combine all codes together. You could separate them and execute part by part.
import torch
import torchvision
# Note: Required to have PyTorch > 1.11.0 & torchvision > 0.12.0 for Food101 dataset
assert int(torch.__version__.split(".")[1]) >= 11
assert int(torchvision.__version__.split(".")[1]) >= 11
import torchvision.datasets as datasets
import torchvision.transforms as transforms
# Setup data directory
import pathlib
data_dir = pathlib.Path("../data")
# Get training data
train_data = datasets.Food101(root=data_dir,
split="train",
# transform=transforms.ToTensor(),
download=True)
# Get testing data
test_data = datasets.Food101(root=data_dir,
split="test",
# transform=transforms.ToTensor(),
download=True)
# Get random 10% of training images
import random
# Setup data paths
data_path = data_dir / "food-101" / "images"
target_classes = ["pizza", "steak", "sushi"]
# Change amount of data to get (e.g. 0.1 = random 10%, 0.2 = random 20%)
amount_to_get = 0.2
# Create function to separate a random amount of data
def get_subset(image_path=data_path,
data_splits=["train", "test"],
target_classes=["pizza", "steak", "sushi"],
amount=0.1,
seed=42):
random.seed(42)
label_splits = {}
# Get labels
for data_split in data_splits:
print(f"[INFO] Creating image split for: {data_split}...")
label_path = data_dir / "food-101" / "meta" / f"{data_split}.txt"
with open(label_path, "r") as f:
labels = [line.strip("\n") for line in f.readlines() if line.split("/")[0] in target_classes]
# Get random subset of target classes image ID's
number_to_sample = round(amount * len(labels))
print(f"[INFO] Getting random subset of {number_to_sample} images for {data_split}...")
sampled_images = random.sample(labels, k=number_to_sample)
# Apply full paths
image_paths = [pathlib.Path(str(image_path / sample_image) + ".jpg") for sample_image in sampled_images]
label_splits[data_split] = image_paths
return label_splits
label_splits = get_subset(amount=amount_to_get)
label_splits["train"][:10]
# Create target directory path
target_dir_name = f"../data/pizza_steak_sushi_{str(int(amount_to_get*100))}_percent"
print(f"Creating directory: '{target_dir_name}'")
# Setup the directories
target_dir = pathlib.Path(target_dir_name)
# Make the directories
target_dir.mkdir(parents=True, exist_ok=True)
import shutil
for image_split in label_splits.keys():
for image_path in label_splits[str(image_split)]:
dest_dir = target_dir / image_split / image_path.parent.stem / image_path.name
if not dest_dir.parent.is_dir():
dest_dir.parent.mkdir(parents=True, exist_ok=True)
print(f"[INFO] Copying {image_path} to {dest_dir}...")
shutil.copy2(image_path, dest_dir)
# Check lengths of directories
def walk_through_dir(dir_path):
"""
Walks through dir_path returning its contents.
Args:
dir_path (str): target directory
Returns:
A print out of:
number of subdiretories in dir_path
number of images (files) in each subdirectory
name of each subdirectory
"""
import os
for dirpath, dirnames, filenames in os.walk(dir_path):
print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")
walk_through_dir(target_dir)
# Zip pizza_steak_sushi images
zip_file_name = data_dir / f"pizza_steak_sushi_{str(int(amount_to_get*100))}_percent"
shutil.make_archive(zip_file_name,
format="zip",
root_dir=target_dir)
There are several different kinds of pre-built datasets and dataset loaders for PyTorch, depending on the problem you're working on.
Problem space | Pre-built Datasets and Functions |
---|---|
Vision | torchvision.datasets |
Audio | torchaudio.datasets |
Text | torchtext.datasets |
Recommendation system | torchrec.datasets |
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# Write transform for image
data_transform = transforms.Compose([
# Resize the images to 64x64
transforms.Resize(size=(64, 64)),
# Flip the images randomly on the horizontal
transforms.RandomHorizontalFlip(p=0.5), # p = probability of flip, 0.5 = 50% chance
# Turn the image into a torch.Tensor
transforms.ToTensor() # this also converts all pixel values from 0 to 255 to be between 0.0 and 1.0
])
# Use ImageFolder to create dataset(s)
from torchvision import datasets
train_data = datasets.ImageFolder(root=train_dir, # target folder of images
transform=data_transform, # transforms to perform on data (images)
target_transform=None) # transforms to perform on labels (if necessary)
test_data = datasets.ImageFolder(root=test_dir,
transform=data_transform)
print(f"Train data:\n{train_data}\nTest data:\n{test_data}")
# Turn train and test Datasets into DataLoaders
from torch.utils.data import DataLoader
train_dataloader = DataLoader(dataset=train_data,
batch_size=1, # how many samples per batch?
num_workers=1, # how many subprocesses to use for data loading? (higher = more)
shuffle=True) # shuffle the data?
test_dataloader = DataLoader(dataset=test_data,
batch_size=1,
num_workers=1,
shuffle=False) # don't usually need to shuffle testing data
train_dataloader, test_dataloader
CV
CNN
# Create a convolutional neural network
class FashionMNISTModelV2(nn.Module):
"""
Model architecture copying TinyVGG from:
https://poloclub.github.io/cnn-explainer/
"""
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.block_1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape,
out_channels=hidden_units,
kernel_size=3, # how big is the square that's going over the image?
stride=1, # default
padding=1),# options = "valid" (no padding) or "same" (output has same shape as input) or int for specific number
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,
stride=2) # default stride value is same as kernel_size
)
self.block_2 = nn.Sequential(
nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
nn.ReLU(),
nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.classifier = nn.Sequential(
nn.Flatten(),
# Where did this in_features shape come from?
# It's because each layer of our network compresses and changes the shape of our input data.
nn.Linear(in_features=hidden_units*7*7,
out_features=output_shape)
)
def forward(self, x: torch.Tensor):
x = self.block_1(x)
# print(x.shape)
x = self.block_2(x)
# print(x.shape)
x = self.classifier(x)
# print(x.shape)
return x
torch.manual_seed(42)
model_2 = FashionMNISTModelV2(input_shape=1,
hidden_units=10,
output_shape=len(class_names)).to(device)
model_2
nn.Conv2d()
stand for?The 2d is for 2-dimensional data. As in, our images have two dimensions: height and width. Yes, there's color channel dimension but each of the color channel dimensions have two dimensions too: height and width. For other dimensional data (such as 1D for text or 3D for 3D objects) there's also
nn.Conv1d()
andnn.Conv3d()
.
Confusion Matrix
from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix
# 2. Setup confusion matrix instance and compare predictions to targets
confmat = ConfusionMatrix(num_classes=len(class_names), task='multiclass')
confmat_tensor = confmat(preds=y_pred_tensor,
target=test_data.targets)
# 3. Plot the confusion matrix
fig, ax = plot_confusion_matrix(
conf_mat=confmat_tensor.numpy(), # matplotlib likes working with NumPy
class_names=class_names, # turn the row and column labels into class names
figsize=(10, 7)
);
Transfer Learning
Pre-trained Models Libraries
There are several places you can find pre-trained models to use for your own problems.
Location | What's there? | Link(s) |
---|---|---|
PyTorch domain libraries | Each of the PyTorch domain libraries (torchvision , torchtext ) come with pretrained models of some form. The models there work right within PyTorch. | torchvision.models , torchtext.models , torchaudio.models , torchrec.models |
HuggingFace Hub | A series of pretrained models on many different domains (vision, text, audio and more) from organizations around the world. There's plenty of different datasets too. | https://huggingface.co/models, https://huggingface.co/datasets |
timm (PyTorch Image Models) library | Almost all of the latest and greatest computer vision models in PyTorch code as well as plenty of other helpful computer vision features. | https://github.com/rwightman/pytorch-image-models |
Paperswithcode | A collection of the latest state-of-the-art machine learning papers with code implementations attached. You can also find benchmarks here of model performance on different tasks. | https://paperswithcode.com/ |
Import Pre-trained Models
# MANUAL CREATION
# Create a transforms pipeline manually (required for torchvision < 0.13)
manual_transforms = transforms.Compose([
transforms.Resize((224, 224)), # 1. Reshape all images to 224x224 (though some models may require different sizes)
transforms.ToTensor(), # 2. Turn image values to between 0 & 1
transforms.Normalize(mean=[0.485, 0.456, 0.406], # 3. A mean of [0.485, 0.456, 0.406] (across each colour channel)
std=[0.229, 0.224, 0.225]) # 4. A standard deviation of [0.229, 0.224, 0.225] (across each colour channel),
])
# Create training and testing DataLoaders as well as get a list of class names
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
test_dir=test_dir,
transform=manual_transforms, # resize, convert images to between 0 & 1 and normalize them
batch_size=32) # set mini-batch size to 32
train_dataloader, test_dataloader, class_names
# AUTO CREATION
# Get a set of pretrained model weights
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # .DEFAULT = best available weights from pretraining on ImageNet
weights
# Get the transforms used to create our pretrained weights
auto_transforms = weights.transforms()
auto_transforms
# Create training and testing DataLoaders as well as get a list of class names
train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir,
test_dir=test_dir,
transform=auto_transforms, # perform same data transforms on our own data as the pretrained model
batch_size=32) # set mini-batch size to 32
train_dataloader, test_dataloader, class_names
Pre-trained Models Available
Architecuture backbone | Code |
---|---|
ResNet's | torchvision.models.resnet18() , torchvision.models.resnet50() … |
VGG (similar to what we used for TinyVGG) | torchvision.models.vgg16() |
EfficientNet's | torchvision.models.efficientnet_b0() , torchvision.models.efficientnet_b1() … |
VisionTransformer (ViT's) | torchvision.models.vit_b_16() , torchvision.models.vit_b_32() … |
ConvNeXt | torchvision.models.convnext_tiny() , torchvision.models.convnext_small() … |
More available in torchvision.models | torchvision.models... |
# OLD: Setup the model with pretrained weights and send it to the target device (this was prior to torchvision v0.13)
# model = torchvision.models.efficientnet_b0(pretrained=True).to(device) # OLD method (with pretrained=True)
# NEW: Setup the model with pretrained weights and send it to the target device (torchvision v0.13+)
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # .DEFAULT = best available weights
model = torchvision.models.efficientnet_b0(weights=weights).to(device)
#model # uncomment to output (it's very long)
# Print a summary using torchinfo (uncomment for actual output)
summary(model=model,
input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape"
# col_names=["input_size"], # uncomment for smaller output
col_names=["input_size", "output_size", "num_params", "trainable"],
col_width=20,
row_settings=["var_names"]
)
Model Finetuning
The process of transfer learning usually goes: freeze some base layers of a pretrained model (typically the features
section) and then adjust the output layers (also called head/classifier layers) to suit your needs.
# Freeze all base layers in the "features" section of the model (the feature extractor) by setting requires_grad=False
for param in model.features.parameters():
param.requires_grad = False
# Set the manual seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)
# Get the length of class_names (one output unit for each class)
output_shape = len(class_names)
# Recreate the classifier layer and seed it to the target device
model.classifier = torch.nn.Sequential(
torch.nn.Dropout(p=0.2, inplace=True),
torch.nn.Linear(in_features=1280,
out_features=output_shape, # same number of output units as our number of classes
bias=True)).to(device)
# # Do a summary *after* freezing the features and changing the output classifier layer (uncomment for actual output)
summary(model,
input_size=(32, 3, 224, 224), # make sure this is "input_size", not "input_shape" (batch_size, color_channels, height, width)
verbose=0,
col_names=["input_size", "output_size", "num_params", "trainable"],
col_width=20,
row_settings=["var_names"]
)
Experiment Tracking
Different ways to track machine learning experiments
Method | Setup | Pros | Cons | Cost |
---|---|---|---|---|
Python dictionaries, CSV files, print outs | None | Easy to setup, runs in pure Python | Hard to keep track of large numbers of experiments | Free |
TensorBoard | Minimal, install tensorboard | Extensions built into PyTorch, widely recognized and used, easily scales. | User-experience not as nice as other options. | Free |
Weights & Biases Experiment Tracking | Minimal, install wandb , make an account | Incredible user experience, make experiments public, tracks almost anything. | Requires external resource outside of PyTorch. | Free for personal use |
MLFlow | Minimal, install mlflow and start tracking | Fully open-source MLOps lifecycle management, many integrations. | Little bit harder to setup a remote tracking server than other services. | Free |
Paper Replicating
A machine learning research paper is a scientific paper that details findings of a research group on a specific area.
The contents of a machine learning research paper can vary from paper to paper but they generally follow the structure:
Section | Contents |
---|---|
Abstract | An overview/summary of the paper's main findings/contributions. |
Introduction | What's the paper's main problem and details of previous methods used to try and solve it. |
Method | How did the researchers go about conducting their research? For example, what model(s), data sources, training setups were used? |
Results | What are the outcomes of the paper? If a new type of model or training setup was used, how did the results of findings compare to previous works? (this is where experiment tracking comes in handy) |
Conclusion | What are the limitations of the suggested methods? What are some next steps for the research community? |
References | What resources/other papers did the researchers look at to build their own body of work? |
Appendix | Are there any extra resources/findings to look at that weren't included in any of the above sections? |
workflow for replicating papers:
- Read the whole paper end-to-end once (to get an idea of the main concepts).
- Go back through each section and see how they line up with each other and start thinking about how they might be turned into code (just like above).
- Repeat step 2 until I've got a fairly good outline.
- Use mathpix.com (a very handy tool) to turn any sections of the paper into markdown/LaTeX to put into notebooks.
- Replicate the simplest version of the model possible.
- If I get stuck, look up other examples.
An example of ViT replicating:
NOTE: Some layers in the code have been created in advance. This example is trying to show the pipeline of combing all layers together.
# 1. Create a ViT class that inherits from nn.Module
class ViT(nn.Module):
"""Creates a Vision Transformer architecture with ViT-Base hyperparameters by default."""
# 2. Initialize the class with hyperparameters from Table 1 and Table 3
def __init__(self,
img_size:int=224, # Training resolution from Table 3 in ViT paper
in_channels:int=3, # Number of channels in input image
patch_size:int=16, # Patch size
num_transformer_layers:int=12, # Layers from Table 1 for ViT-Base
embedding_dim:int=768, # Hidden size D from Table 1 for ViT-Base
mlp_size:int=3072, # MLP size from Table 1 for ViT-Base
num_heads:int=12, # Heads from Table 1 for ViT-Base
attn_dropout:float=0, # Dropout for attention projection
mlp_dropout:float=0.1, # Dropout for dense/MLP layers
embedding_dropout:float=0.1, # Dropout for patch and position embeddings
num_classes:int=1000): # Default for ImageNet but can customize this
super().__init__() # don't forget the super().__init__()!
# 3. Make the image size is divisible by the patch size
assert img_size % patch_size == 0, f"Image size must be divisible by patch size, image size: {img_size}, patch size: {patch_size}."
# 4. Calculate number of patches (height * width/patch^2)
self.num_patches = (img_size * img_size) // patch_size**2
# 5. Create learnable class embedding (needs to go at front of sequence of patch embeddings)
self.class_embedding = nn.Parameter(data=torch.randn(1, 1, embedding_dim),
requires_grad=True)
# 6. Create learnable position embedding
self.position_embedding = nn.Parameter(data=torch.randn(1, self.num_patches+1, embedding_dim),
requires_grad=True)
# 7. Create embedding dropout value
self.embedding_dropout = nn.Dropout(p=embedding_dropout)
# 8. Create patch embedding layer
self.patch_embedding = PatchEmbedding(in_channels=in_channels,
patch_size=patch_size,
embedding_dim=embedding_dim)
# 9. Create Transformer Encoder blocks (we can stack Transformer Encoder blocks using nn.Sequential())
# Note: The "*" means "all"
self.transformer_encoder = nn.Sequential(*[TransformerEncoderBlock(embedding_dim=embedding_dim,
num_heads=num_heads,
mlp_size=mlp_size,
mlp_dropout=mlp_dropout) for _ in range(num_transformer_layers)])
# 10. Create classifier head
self.classifier = nn.Sequential(
nn.LayerNorm(normalized_shape=embedding_dim),
nn.Linear(in_features=embedding_dim,
out_features=num_classes)
)
# 11. Create a forward() method
def forward(self, x):
# 12. Get batch size
batch_size = x.shape[0]
# 13. Create class token embedding and expand it to match the batch size (equation 1)
class_token = self.class_embedding.expand(batch_size, -1, -1) # "-1" means to infer the dimension (try this line on its own)
# 14. Create patch embedding (equation 1)
x = self.patch_embedding(x)
# 15. Concat class embedding and patch embedding (equation 1)
x = torch.cat((class_token, x), dim=1)
# 16. Add position embedding to patch embedding (equation 1)
x = self.position_embedding + x
# 17. Run embedding dropout (Appendix B.1)
x = self.embedding_dropout(x)
# 18. Pass patch, position and class embedding through transformer encoder layers (equations 2 & 3)
x = self.transformer_encoder(x)
# 19. Put 0 index logit through classifier (equation 4)
x = self.classifier(x[:, 0]) # run on each sample in a batch at 0 index
return x
Model Development
Tool/resource | Deployment type |
---|---|
Google's ML Kit | On-device (Android and iOS) |
Apple's Core ML and coremltools Python package | On-device (all Apple devices) |
Amazon Web Service's (AWS) Sagemaker | Cloud |
Google Cloud's Vertex AI | Cloud |
Microsoft's Azure Machine Learning | Cloud |
Hugging Face Spaces | Cloud |
API with FastAPI | Cloud/self-hosted server |
API with TorchServe | Cloud/self-hosted server |
ONNX (Open Neural Network Exchange) | Many/general |
Many more… |
Gradio
Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!
For more information: https://www.gradio.app/docs#components
# Import/install Gradio
try:
import gradio as gr
except:
!pip -q install gradio
import gradio as gr
print(f"Gradio version: {gr.__version__}")
# Put EffNetB2 on CPU
effnetb2.to("cpu")
# Check the device
next(iter(effnetb2.parameters())).device
# Create title, description and article strings
title = "FoodVision Mini 🍕🥩🍣"
description = "An EfficientNetB2 feature extractor computer vision model to classify images of food as pizza, steak or sushi."
article = "Created at [09. PyTorch Model Deployment](https://www.learnpytorch.io/09_pytorch_model_deployment/)."
# Create the Gradio demo
demo = gr.Interface(fn=predict, # mapping function from input to output
inputs=gr.Image(type="pil"), # what are the inputs?
outputs=[gr.Label(num_top_classes=3, label="Predictions"), # what are the outputs?
gr.Number(label="Prediction time (s)")], # our fn has two outputs, therefore we have two outputs
examples=example_list,
title=title,
description=description,
article=article)
# Launch the demo!
demo.launch(debug=False, # print errors locally?
share=True) # generate a publically shareable URL?
Upload to Hugging Face
- Sign up for a Hugging Face account.
- Start a new Hugging Face Space by going to your profile and then clicking "New Space".
- Note: A Space in Hugging Face is also known as a "code repository" (a place to store your code/files) or "repo" for short.
- Give the Space a name, for example, mine is called
mrdbourke/foodvision_mini
, you can see it here: https://huggingface.co/spaces/mrdbourke/foodvision_mini - Select a license (I used MIT).
- Select Gradio as the Space SDK (software development kit).
- Note: You can use other options such as Streamlit but since our app is built with Gradio, we'll stick with that.
- Choose whether your Space is it's public or private (I selected public since I'd like my Space to be available to others).
- Click "Create Space".
- Clone the repo locally by running something like:
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
in terminal or command prompt.- Note: You can also add files via uploading them under the "Files and versions" tab.
- Copy/move the contents of the downloaded
foodvision_mini
folder to the cloned repo folder. - To upload and track larger files (e.g. files over 10MB or in our case, our PyTorch model file) you'll need to install Git LFS (which stands for "git large file storage").
- After you've installed Git LFS, you can activate it by running
git lfs install
. - In the
foodvision_mini
directory, track the files over 10MB with Git LFS withgit lfs track "*.file_extension"
.- Track EffNetB2 PyTorch model file with
git lfs track "09_pretrained_effnetb2_feature_extractor_pizza_steak_sushi_20_percent.pth"
.
- Track EffNetB2 PyTorch model file with
- Track
.gitattributes
(automatically created when cloning from HuggingFace, this file will help ensure our larger files are tracked with Git LFS). You can see an example.gitattributes
file on the FoodVision Mini Hugging Face Space.git add .gitattributes
- Add the rest of the
foodvision_mini
app files and commit them with:git add *
git commit -m "first commit"
- Push (upload) the files to Hugging Face:
git push
- Wait 3-5 minutes for the build to happen (future builds are faster) and your app to become live!