ECE 285 Transfer Learning

ECE 285 { MLIP { Assignment 3 Transfer Learning Written by Anurag Paul and Charles Deledalle. Last Updated on April 30, 2019. In Assignments 1 and 2, we were focusing on classication on the MNIST Dataset. In this assignment, we will focus on the best practices for managing a deep learning project and will use transfer learning for solving a classication problem. You will learn to use the PyTorch's DataLoader, and create checkpoints to stop and restart model training. We want to learn how to predict the species of a bird given its picture. We will be using Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset. The dataset is located on DSMLP here /datasets/ee285s-public/caltech ucsd birds/ and is also downloadable from http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz. It has 12,000 images of 200 bird species. We will be working on a small subset of this dataset with 20 bird species having 743 training images and 372 images for validation. This directory contains a folder CUB 200 2011 with all the images and two les: train.csv and val.csv. Each line of these les correponds to a sample described by the le path of the image, the bounding box values surrounding the bird, and the respective class label for each species from 0 to 19 (separated by commas). Given the very small size of this subset, we will rely on transfer learning (otherwise we will be facing the curse of dimensionality). It is possible that DSMLP will be too busy to let you run everything from this assignment. We know what the results should look like, so we will only grade the code. 1 Getting started First of all, connect to DSMLP (ieng6 server) and start a pod with enabled GPU/CUDA capabilities $ launch-py3torch-gpu.sh Next connect to your Jupyter Notebook from your web browser (refer to Assignment 0 for more details). Create a new notebook assignment3.ipynb and import %matplotlib notebook import os import numpy as np import torch from torch import nn from torch.nn import functional as F import torch.utils.data as td import torchvision as tv import pandas as pd from PIL import Image from matplotlib import pyplot as plt If any of the above libraries is not available, install it using 1 $ pip install --user Select the relevant device device = 'cuda' if torch.cuda.is_available() else 'cpu' print(device) For the following questions, please write your code and answers directly in your notebook. Organize your notebook with headings, markdown and code cells (following the numbering of the questions). 2 Data Loader In order to start training a classier, we rst need to build routines for loading the data. We will achieve this using the data management tools provided in the package torch.utils.data. 1. Create a variable dataset root dir and make it point to the Bird dataset directory. Advice: a good habit is to set the value of such a variable according to socket.gethostname() and getpass.getuser(). This allows you and your collaborators to use the same piece of code on dierent hosts or clusters in which data may be stored at dierent locations. 2. Central to torch.utils.data, is the abstract class Dataset that will be useful for managing our training and testing data. Please refer to the documentation here https://pytorch.org/docs/ stable/data.html. Interpret and complete the following piece of code: class BirdsDataset(td.Dataset): def __init__(self, root_dir, mode="train", image_size=(224, 224)): super(BirdsDataset, self).__init__() self.image_size = image_size self.mode = mode self.data = pd.read_csv(os.path.join(root_dir, "%s.csv" % mode)) self.images_dir = os.path.join(root_dir, "CUB_200_2011/images") def __len__(self): return len(self.data) def __repr__(self): return "BirdsDataset(mode={}, image_size={})". \ format(self.mode, self.image_size) def __getitem__(self, idx): img_path = os.path.join(self.images_dir, \ self.data.iloc[idx]['file_path']) bbox = self.data.iloc[idx][['x1', 'y1', 'x2', 'y2']] img = Image.open(img_path).convert('RGB') img = img.crop([bbox[0], bbox[1], bbox[2], bbox[3]]) transform = tv.transforms.Compose([ # COMPLETE ]) 2 x = transform(img) d = self.data.iloc[idx]['class'] return x, d def number_of_classes(self): return self.data['class'].max() + 1 The method getitem returns the image x of index idx together with its class label d. It crops the image according to the bounding box indicated in the csv le. Refer to https: //pytorch.org/docs/stable/torchvision/transforms.html and complete the relevant section to use torchvision to resize it to the size indicated by image size, convert it to a tensor and then normalize it to the range [􀀀1; 1]. 3. Copy paste the following function def myimshow(image, ax=plt): image = image.to('cpu').numpy() image = np.moveaxis(image, [0, 1, 2], [2, 0, 1]) image = (image + 1) / 2 image[image < 0] = 0 image[image > 1] = 1 h = ax.imshow(image) ax.axis('off') return h Create the object train set as an instance of BirdsDataset. Sample the element with index 10. Store it in a variable x. Use myimshow function to display the image x. 4. The main advantage of using PyTorch's Dataset is to use its data loader mechanism with DataLoader. Read the documentation and create train loader: the object that loads the train- ing set and split it into shued mini-batches of size B=16. Use pin memory=True. What is the advantage of using pin memory? How many mini-batches are there? 5. In a dierent cell, display the rst image and label pair of the rst four mini-batches. Re-evaluate your cell, what do you observe? 6. Also create val set as an instance of BirdsDataset using mode='val'. Then, create val loader similar to train loader but without shue. Why do you think we need to shue the dataset for training but not for validation? 3 Abstract Neural Network Model In this section, we will use the mechanism of abstract classes and inheritance to dene all the common methods which will be shared between dierent deep learning models that we will build in the future works in this course and beyond. These functionalities are provided to you in a homemade package /datasets/ee285s-public/nntools.py. From your terminal, create a symbolic link on this package into your working/current directory $ ln -s /datasets/ee285s-public/nntools.py . Back to your Jupyter Notebook, you can now import its functions as: 3 import nntools as nt A main concept introduced in nntools is the abstract class NeuralNetwork. This class describes the general architecture and functionalities that a neural network object is expected to have. In particular it has two methods forward and criterion used respectively to make a forward pass in the network and compute the loss. Read its documentation by typing help(nt.NeuralNetwork) and next open nntools.py to inspect its code. As you can observe these methods are tagged as abstract and as a result the class is said to be abstract. Note you already used abstract classes: nn.Module and nn.Dataset. 7. Try to instantiate a neural network as net = nt.NeuralNetwork(). What do you observe? An abstract class does not implement all of its methods and cannot be instantiated. This is because the implementation of forward and criterion will depend on the specic type and architecture of neural networks we will be considering. The implementation of these two methods will be done in sub-classes following the principle of inheritance. For instance, we can dene the subclass NNClassifier that inherits from NeuralNetwork and imple- ments the method criterion as being the cross entropy loss class NNClassifier(nt.NeuralNetwork): def __init__(self): super(NNClassifier, self).__init__() self.cross_entropy = nn.CrossEntropyLoss() def criterion(self, y, d): return self.cross_entropy(y, d) Compare to NeuralNetwork, this class is more specic as it considers only neural networks that will produce one-hot codes and that are then classiers. Nevertheless, this class is still abstract as it does not implement the method forward. Indeed, the method forward still depend on the specic architecture of the classication network we will be considering. 4 VGG-16 Transfer Learning We are going to consider a new classier built on the principle of transfer learning from a pretrained network. VGG (by Simonyan and Zisserman, 2014) was runner-up of the ILSVRC-2014 challenge. It is a very popular network for transfer learning and is widely used as a feature extractor for multiple computer vision tasks. Here, we will use the 16-layer version of the VGG with batch norm from torchvision package (vgg16 bn). We will replace the nal fully connected (FC) layer with a one specic to our task. We will then train only this task-specic last FC layer and will keep all other layers as frozen (i.e., they will not be trained). The main advantage of transfer learning is that we are enabling transfer of knowledge gained by model on one task to be adapted to learn another task. It also reduces drastically the number of parameters to learn, hence, avoiding overtting and making the training to converge in just a few epochs. 8. In your notebook, evaluate the following vgg = tv.models.vgg16_bn(pretrained=True) Print the network and inspect its learnable parameters (as done in Assignment 2). 4 9. Copy/paste the code of NNClassifier in your notebook, and create a new subclass VGG16Transfer that inherits from NNClassifier by completing the following: class VGG16Transfer(NNClassifier): def __init__(self, num_classes, fine_tuning=False): super(VGG16Transfer, self).__init__() vgg = tv.models.vgg16_bn(pretrained=True) for param in vgg.parameters(): param.requires_grad = fine_tuning self.features = vgg.features # COMPLETE num_ftrs = vgg.classifier[6].in_features self.classifier[6] = nn.Linear(num_ftrs, num_classes) def forward(self, x): # COMPLETE y = self.classifier(f) return y Note that fine tuning=True can be used if we were willing to retrain (ne tune) all other layers, but this will take much more time and memory. This would also require more data (or using a very small learning rate and carefully monitoring the loss) as the number of parameters would be much higher, and the procedure subject to overtting. 10. The class VGG16Transfer is no longer abstract as it implements all of the methods of its ancestors. Create an instance of this class for a classication problem with a number of classes specied as num classes = train set.number of classes(). Print the network and inspect its learnable parameters. 5 Training experiment and checkpoints The package nntools introduces another mechanism for running learning experiments. An important aspect when running such an experiment is to regularly create checkpoints or backups of the current model, optimization state and statistics (training loss, validation loss, accuracy, etc). In case of an unexpected error, you need to be able to restart the computation from where it stopped and you do not want to rerun everything from scratch. Typical reasons for your learning to stop are server disconnection/timeout, out of memory errors, CUDA runtime errors, quota exceeded error, etc. The computation of statistics will be delegated to the class StatsManager, that provides functionalities to accumulate statistics from dierent mini batches and then aggregate/summarize the information at the end of each epoch. Read and interpret the code of StatsManager. This class is not abstract since it implements all of its methods. We could use an instance of this class to monitor the learning for our classication problem. But this class is too general and then does not compute classication accuracy. Even though the class is not abstract, we can still create a subclass by inheritance and redene some of its methods, this is called overloading. 11. Create a new subclass ClassificationStatsManager that inherits from StatsManager and over- load each method in order to track the accuracy by completing the following code class ClassificationStatsManager(nt.StatsManager): 5 def __init__(self): super(ClassificationStatsManager, self).__init__() def init(self): super(ClassificationStatsManager, self).init() self.running_accuracy = 0 def accumulate(self, loss, x, y, d): super(ClassificationStatsManager, self).accumulate(loss, x, y, d) _, l = torch.max(y, 1) self.running_accuracy += torch.mean((l == d). oat()) def summarize(self): loss = super(ClassificationStatsManager, self).summarize() accuracy = 100 * # COMPLETE return {'loss': loss, 'accuracy': accuracy} Experiments will be carried out by the class Experiment which is dened with respect to 6 inputs a given network, a given optimizer, a given training set, a given validation set, a given mini batch size, a given statistic manager. Once instantiated, the experiment can be run for n epochs on the training set by using the method run. The statistics at each iteration are stored as a list in the attribute history. 12. An experiment can be evaluated on the validation set by the method evaluate. Read the code of that method and note that rst self.net is set to eval mode. Read the documentation https: //pytorch.org/docs/stable/nn.html#torch.nn.Module.eval and explain why we use this. 13. The Experiment class creates a checkpoint at each epoch and automatically restarts from the last available checkpoint. The checkpoint will be saved into (or loaded from) the directory specied by the optional argument output dir of the constructor. If not specied, a new directory with an arbitrary name is created. Take time to read and interpret carefully the code of Experiment and run the following lr = 1e-3 net = VGG16Transfer(num_classes) net = net.to(device) adam = torch.optim.Adam(net.parameters(), lr=lr) stats_manager = ClassificationStatsManager() exp1 = nt.Experiment(net, train_set, val_set, adam, stats_manager, output_dir="birdclass1", perform_validation_during_training=True) Check that a directory birdclass1 has been created and inspect its content. Visualize the le config.txt. What does the le checkpoint.pth.tar correspond to? 14. Change the learning rate to 1e-4 and re-evaluate the cell. What do you observe? Change it back to 1e-3 and re-evaluate the cell. What do you observe? Why? 15. Copy and complete the following code running the experiment for 20 epochs 6 def plot(exp, fig, axes): axes[0].clear() axes[1].clear() axes[0].plot([exp.history[k][0]['loss'] for k in range(exp.epoch)], label="traininng loss") # COMPLETE plt.tight_layout() fig.canvas.draw() fig, axes = plt.subplots(ncols=2, figsize=(7, 3)) exp1.run(num_epochs=20, plot=lambda exp: plot(exp, fig=fig, axes=axes)) and displaying two plots side-by-side, one showing the losses and the other showing the evolution of accuracy over the epochs. The training should take about 8 minutes. Make sure your loss evolutions are consistent with the ones below. If they are not so, interupt it (Esc+i+i), check your code, delete the output dir, and start again. 7 6 ResNet18 Transfer Learning ResNet (by He et al., 2015) is an ultra deep network with up to 152 layers which won the ILSVRC 2015 challenge. Pretrained models of ResNet are available in torchvision for 18, 34, 50, 101 and 152 layer versions. We will be using the 18 layer version for this assignment that can be loaded as resnet = tv.models.resnet18(pretrained=True) 16. Similar to VGG16Transfer, create a subclass Resnet18Transfer that inherits from NNClassifer and that redenes the last FC layer. 17. Create an instance of Resnet18Transfer and create a new experiment exp2 making backups in the directory birdclass2. Run the experiment for 20 epochs with Adam and learning rate 1e-3 using the same function plot as before. 18. Using the method evaluate of Experiment, compare the validation performance obtained by exp1 and exp2 using respectively VGG16Transfer and Resnet18Transfer. 7 Bonus (optional and ungraded) 19. Create a new subclass of StatsManager that implements top-k accuracy statistic. 20. Try dierent batch sizes, learning rates and try to achieve the best validation accuracy. 21. Try other pretrained models from torchvision and see if some other model performs better than VGG16 or Resnet18 on this task.