讲解 Robot Vision Summative Assignment 2讲解 Python编程

Robot Vision [06-25024]

Summative Assignment 2

Instructions (Please read carefully!)

This assessment is summative and contains two parts. In Part 1, you will carry out image stitching, image aligning, and various feature detection comparing. In Part 2, you will use PyTorch to perform. image classification and image regression tasks with a model trained by yourself, a pre-trained model, and a fine-tuned model.

Your answer must be submitted to Canvas before the deadline in the form. of a single zip archive file containing:

1. Your answers to the questions in prose and diagrams. This should take the form. of a single PDF document with the answers for each question using the provided LaTeX template.

2. Your code and any accompanying files necessary to execute the code for any programming questions as specified in the LaTeX template.

and a separate PDF document with the answers for Turnitin checking (two files in total; one zip file and one PDF file). Some or all of the text of each question is emphasised using italics. This emphasis indicates a question that must be explicitly answered or a task that must be completed.

Part 1

Question 1.1 A panorama is formed by stitching together multiple images into one seamless image. In this task, you will need to implement Feature Based Panoramic Image Stitching in Python.

Question 1.1.1 [10 marks] Three images of the Aston Webb building have been provided. The following steps need to be taken in order to create the panorama:

1. Use any preprocessing you like to manipulate the given images

2. Create and Configure the Stitcher.

3. Stitch Images.

4. Check the Result and Display the Panorama.

5. Save the Panorama

A guide on this process can be found here: https://www.opencvhelp.org/tutorials/advanced/panorama-creation/

Your solution to this task should include:

1. Figure showing undistorted input images (report in PDF)

2. Figure showing complete panorama (report in PDF)

3. A written explanation of the steps taken in the report, stating which functions you used, why you used them and a short explanation of how they work. (report in PDF)

4. Code for Task 1.1.1 (python file)

5. All images needed for the code to function

Question 1.1.2 [10 marks] The panorama which has been produced is not a uniform. shape. Write an algorithm from scratch that iteratively crops the image so that no black areas are included. Your algorithm should preserve as much of the non-black areas of the image as possible and work with any provided panorama. See Fig 2 for an example of the expected result.

Your solution should include:

1. A figure showing the original panorama overlaid with lines representing the cropped area (report in PDF)

2. The cropped panorama (report in PDF)

3. An explanation of your algorithm (report in PDF)

4. Code for Task 1.1.2

5. All files needed for Task 1.2 to function

Figure 1: Figures for Question 1.1

Question 1.2 Image registration is a digital image processing technique that helps us align dif-ferent images of the same scene. In this task you will be performing image alignment and registration with OpenCV.

Question 1.2.1 [12 marks]

In Figure 2(left), we have a template of the orginal and the form. in Figure 2(middle) is taken by the mobile phone. The result of the middle form. after being processed by image alignment technology is as shown in the picture in Figure 2(right), which can be the same as the template on the left.

The task is to align ’./part1/1_2/1_2_test.jpg’ based on ’./part1/1_2/1_2_template.jpg’.

Figure 2: Figures for Question 1.2

The process should following:

1. Use any pre-processing you like to manipulate the given images

2. Detect SIFT features in both images and experiment with SIFT parameters to achieve the best result.

3. Apply FLANN (Fast Library for Approximate Nearest Neighbors; more informa-tion about FLANN can be found in https://www.cs.ubc.ca/research/flann/uploads/FLANN/flann_visapp09.pdf) to match keypoints across images

4. Compute the homography matrix etc.

5. Apply a perspective warp to align the images

6. Try 2 more methods to detect and match keypoints, such as K-Nearest Neighbors Matcher (KNN),Brute-Force Matcher.

Your solution to this task should include:

1. Figure showing matched features (report in PDF)

2. Figure showing aligned image (report in PDF)

3. A written explanation and images of the steps taken in the report, stating which functions you used, why you used them and a short explanation of how they work. (report in PDF)

4. A written explanation of different methods comparison. (report in PDF)

5. A written explanation of the available SIFT parameters, what they are, how they function and what changes you made and why?

6. Code for Task 1.2 (python file)

7. All images needed for the code to function

Question 1.3 Feature detection plays an important role in many computer vision tasks. In this question, we will be exploring and evaluating the different feature detection methods.

Question 1.3.1 [8 marks] An image has been provided. Plot on a 3 x 2 sub-plotted figure, the 200 strongest features when using the Minimum Eigenvalue, SIFT, KAZE, FAST, ORB and the Harris-Stpehens algorithms. Include this sub-figure in your report and ensure that when the python files execute, it appears as a sub-figure.

Your solution should include:

1. Your code for Task 2 and all files needed for the code to run

2. The generated subplot figure (report in PDF)

3. The generated subplot figure

Question 1.3.2 [10 marks] Describe briefly 3 of the 6 feature detection methods previously explored. Give a brief overview of each of these methods and how they differ from each other. Discuss how these differences are represented in the results from Question 1.3.1. Include the answer to this section in your report.

Figure 3: Figures for Question 1.3

Part 2

In this section, you will explore a number of different deep-learning tasks using the PyTorch framework. You must complete your code for each section in the interactive notebook files provided. You are expected to submit the notebook files, model checkpoints, and your report at the end of the assignment. Take special care that your notebook files can be executed, and that all paths are relative.

Question 2.1 [16 marks]

In this task you will carry out a classification task for a subset of CIFAR-10/100 data using PyTorch. The data and images needed for the task are in the 2_1 folder. Follow the instructions below to complete this task. Complete your code in the provided Notebook file. Submit this notebook file alongside the report and model checkpoint. In this task, we will be performing transfer learning on the CIFAR-10/100 Dataset. You will be expected to load in existing pre-trained models, adapt said models for the current task and optimise the models.

1. Load the images from CIFAR-10.zip. In total, there are 2500 images of size 32 × 32 and 10 categories in this dataset. The training data and test data for each category of images are stored in /train and /val folders, respectively. In the training dataset, there are 200 images in each category, whereas there are 50 images per category in the test dataset. The category of each image is given by its folder name. Simplified folder structure is shown in figure 4.

2. Perform. transfer learning using the vgg16 model with pretrained weights. For this task use the VGG16 Weights.IMAGENET1K V1 weights. Adapt the VGG network to classify only the 10 categories from the dataset. Use the stochastic gradient descent

Figure 4: Simplified file structure of imagenet.zip.

with momentum optimizer, set the mini batch size to 128, the initial learning rate to 0.01, and shuffle the training data before each training epoch. Produce a plot of the training progress and include it in your report.

3. Use the test dataset, to test the performance of the trained model. Store the cross-entropy loss and accuracy in a table.

The formula of the average cross-entropy loss is given by

where

i is image, N is the number of images, c is category, M is the number of categories, p is the predicted probability i is of c.

The accuracy is given by

4. Load the images from CIFAR-100. In total, there are 250 images of a category of bicycle of size 32 × 32. There are 200 and 50 images of such category in /train and/val folders, respectively. Modify the model trained in the previous task and retrain it using the data in the /train folder, allowing it to classify 11 categories (using the same training options as in the previous task). Plot the training progress and include it into your written report. Save and submit the model’s state dict as a file named 2-1_11.pth.

5. Merge the data in /CIFAR-10/test folder and /CIFAR-100/test folder. Run the re-trained model on the merged test dataset. Again, include the cross-entropy loss, accu-racy, and the worst classified category in your report.

6. SF.JPG is a 512 × 512 image obtained by Stable Diffusion. Resize this image and use 2-1_11output.pth to classify this image. Include the predicted result, the two highest predicted probabilities and their corresponding categories in your report.

Question 2.2 [16 marks] Train a network using the MNIST handwritten digit database for classification.

1. Load in the MNIST dataset from the torchvision repository. Show 30 random example images (24 training images and 6 validation images) in a 5*6 subplot. Count the number of labels per each category and include the result into your report.

2. Augment the 24 randomly selected training images from the previous task. Use the torchvision.transforms function with random rotation [-30, 30], using random reflection along the top-bottom direction and in the left-right direction, as well as using random X translation [-3, 2] and Y translation [-2, 4]. Show example images of the augmented data and include it into your report.

3. Define {1,3,5,7,9} as odd numbers and {0,2,4,6,8} as even numbers. Perform. transfer learning to classify the odd and even numbers of the MNIST dataset. Implement the network architecture shown in Figure 5. The input image size is 28*28, pool size and stride are 2. Use Adam optimizer, set the initial learning rate to 0.02, and shuffle the training data before each training epoch (train the network for 35 epochs). Plot the training progress and include it into your written report. Save the trained model named 2-2_OE.pth and submit it with your report. What is the accuracy of the model using validation images?

4. Create your own test data (3 sets of digits handwritten by you, total 30 images, try to make them look as different as possible. You are free to use any device or software (e.g. Microsoft paint) to create the test data.), include it into your report as in Figure 6 and report the accuracy of your model on the test images. The test images must be submitted with your code, otherwise this part of the task will not be graded.

Figure 5: Network architecture

Figure 6: Handwritten digits

Question 2.3 [18 marks] Use the provided network layer to carry out Facial Keypoints Detection task. In this task, the model needs to input an image and output a vector containing the co-ordinates of each keypoint. Typically, the length of this vector is equal to the number of keypoints in the image. Follow the instructions below to complete this task. The data and the model needed for the task are in the 2_3 folder.

1. Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 keypoints, which represent the elements of the face.

(a) train.csv contains a list of 100 training images. Each row contains the (x,y) coordinates of the 15 keypoints. The image data (96 × 96 for each image) is a list of pixels sorted by row in the last column.

(b) test.csv contains a list of 25 test images. Each row contains the (x,y) coordinates of the 15 keypoints. The image data is a list of pixels sorted by row in the last column.

2. Perform. transfer training using the data from the train.csv. Adapt the network ar-chitecture introduced in Question 2.2 for this task. Describe and justify the changes that you have made in your report. Use Adam optimizer, set the maximum number of Epochs to 10, the batch size to 100, the initial learning rate to 0.01, and shuffle the training data before each training epoch. Plot the training progress and include it in your written report. Save and submit the trained model into a file named 2-3_FKD.pth.

3. Test the trained network using the data in the test.csv folder. Calculate the mean squared error for each test data.

4. Plot the image with the smallest mean square error in the test data and the correspond-ing feature points. Save and submit the image named 2-3_img.png. An example of the image is shown in figure 7.

Figure 7: Example image and its feature points.

联系我们