讲解MCEN90048 AI辅导留学生Python设计

MCEN90048 AI for Mechatronics
Project 1: Let us be a Fashion Critic
Submission:
Please submit via LMS either the link to your GitHub/Bitbucket repository, or a compressed
file that contains all the code files and results structured in proper folders.
Due Date: 6:00 pm, 1st May 2020

Contents
1. Dataset description ......................................................................................................................... 2
2. Project background ......................................................................................................................... 2
3. Special protocol for training and test ............................................................................................. 4
4. Description of tasks, tutorials and expected results ....................................................................... 4
5. Structure of code and results .......................................................................................................... 5
6. Marking criterion ............................................................................................................................ 7
Appendix 1 – Expected test accuracy on Fashion MNIST dataset ...................................................... 7

1. Dataset description

Figure 1. A Sprite image made by stacking 400 image samples from the Fashion MNIST dataset.
The Fashion MNIST dataset (link: https://github.com/zalandoresearch/fashion-mnist) contains a
training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28 × 28
grey-scale image that belongs to the following 10 classes: 0 - 'T-shirt/top’, 1 - 'Trouser’, 2 -
'Pullover’, 3 - 'Dress’, 4 - 'Coat’, 5 - 'Sandal’, 6 - 'Shirt’, 7 - 'Sneaker’, 8 - 'Bag’, 9 - 'Ankle boot'.

2. Project background
The objective of this project to help the students get familiar with basic TensorFlow (Keras)
operations and process to train a classification model.
The basic process of data analysis is:
1) Pre-process the training data, e.g., you may collect prior information about the dataset,
remove outliers, analyse the statistics of each feature, scale/normalize the features. For
the natural image, the pixel value varies from 0 to 255, which should be scaled to either
[0,1] or [−1,1] to avoid numeric issues during forward- and back-propagation.
2) Design a model. For Fashion MNIST dataset, the input has 28 × 28 = 784 pixels, which is
considered small and may be processed by Multi-layer perceptrons (MLP), convolutional
neural networks (CNN) or recurrent neural networks (RNN).
Checklist for design choices and hyper-parameters:
• Model architecture. You may choose:
o MLP, CNN or RNN (or any other neural network models you think suitable) as the
base model.
o The number of layers and number of neurons or channels in each layer.
o The activation function in each layer. For the output layer, since it is a multi-class
classification problem, softmax activation is recommended.
• Training procedure. You may choose
o Which gradient descent method to update the model parameters; popular choices
include SGD, Momentum, RMSprop, and Adam. In TensorFlow, the back-
propagation and gradient updates are done automatically by the selected
optimizer.
o The learning rate and schedule. Learning rate is very important in gradient
methods; if your model does not perform as expected, perhaps try another
learning rate first. Popular choices for learning rate schedule include constant,
step decay, exponential decay, and warm restart.
o The batch size; popular choices are 2: 32, 64, 128, 256 (please note that there is
no strong reason behind these choices; batch size like 60, 100 are totally fine, but
60,000 is not suitable due to memory issues).
o The number of iterations or epochs, or when to stop training. An epoch is defined
as iterations during which the model has seen every training example once. For
example, if your batch size is 60, it takes
60000
60
= 1000 iterations to see each
training example, thus one epoch equals 1000 iterations. Early stop may be used.
• Regularization methods. Good normalization and regularization methods may
improve training stability, model capacity and/or prevent overfitting. See lecture
slides for a variety of regularization methods.
• Objective/loss function. For multi-class classification and balanced class, it is
suggested to use cross entropy loss.
• Others: ________________. You are welcome to fill the blank here.
3) Train the model. The usual training protocol is to divide the training data into training set
TR (around 80% of whole training data) and validation set VA (the rest 20%).
a. After you’ve made decisions on the hyperparameters (let’s call this set HP1), you start
the training and the model parameters are updated based on the training set TR. After
every few iterations (e.g., 200), you validate the model on validation dataset VA to
monitor the training process. After you stop the training, you validate the trained
model M1 on validation dataset to get a performance for HP1 and M1. You may repeat
the model training using the same HP1 and save the model with the best validation
accuracy.
b. Now it is time to try different sets of hyperparameters HP2, HP3, HP4… For each
hyperparameters set HPi, you repeat the training process in step 1 and get a
performance for HPi and Mi.
c. After enough trials, you obtain the best set of hyperparameters HPbest and the
corresponding trained model Mbest.
4) Test the model. At this stage, you may test your best model on the test set and report the
performance (accuracy).
3. Special protocol for training and test
Please note that, in this project, we do NOT encourage you to repeatedly train the model to
get the best possible set of hyperparameters because this takes a significant amount of time
and is not the aim of this project. Thus, you may just train the model on the whole training
dataset and validate it on the test dataset. A reasonable test accuracy for Fashion MNIST
dataset is in the range of 0.88 – 0.95, depending on your network architecture.
The accuracy is defined as follows. Consider a single example in the test set; let be its
label; assume = 6, i.e., is an image of 'Shirt’. See below for a few examples of network
outputs:
• = [0.01,0.01,0.02,0.0,0.01,0.01,0.91,0.01,0.01,0.01]. Class 6 has the highest
probability in the predictions; thus, the model correctly predicts the label with a high
confidence 0.91.
• = [0.11,0.11,0.12,0.01,0.11,0.11,0.20,0.11,0.11,0.11]. Class 6 has the highest
probability in the predictions; thus, the model correctly predicts the label, though with a
low confidence 0.20.
• = [0.21,0.11,0.12,0.01,0.11,0.11,0.10,0.11,0.11,0.11]. Class 1 has the highest
probability in the predictions; thus, the model fails to predict the label.
Let be the number of examples where the model correctly predicts the labels, the test
accuracy is defined as

10000
.
More information on the expected accuracy can be found at the Benchmark section of
download link or Appendix 1 of this document. If you have obtained a reasonable accuracy
(e.g., ±0.025 of the expected accuracy of your type of model), perhaps stop there and focus
on other tasks.

4. Description of tasks, tutorials and expected results
The students should finish the following basic tasks:
1. Visualizing the dataset. In this task, you are required to write code to randomly sample
400 images from the training set and prepare them for visualization in TensorBoard using
t-SNE and PCA.
The expected files for submission include:
• Checkpoint files: see details at the section “Train and test a classification model”
below.
• Sprite image file: xxx.png
• Label file: xxx.tsv
• Projector configuration file: projector_config.pbtxt
2. Train and test a classification model. In this task, you are required to train a neural
network classifier on the training set and validate your model on the test dataset. See
previous section “Special protocol for training and test” for more details.
You are required to submit the checkpoint files or HDF5 file for a successfully trained
model. If you choose checkpoint files, you need to submit the following:
• A file named checkpoint
• One or several files named xxx.index
• Zero, one or several files named xxx.meta
• One or several files named xxx.data-00000-of-00001
3. Monitor the training process. In this process, you are required to add the model
parameters, their gradients and training loss into summary; save model summary to event
files (one or several files named events.out.tfevents.xxx); visualize the summaries and
graph in TensorBoard.
4. Profile the training process. In this task, you are required to collect the runtime statistics
during training process for one epoch and visualize them in TensorBoard. You need to
submit a folder called profile with the following files:
• xxx.input_pipeline.pb
• xxx.kernel_stats.pb
• xxx.overview_page.pb
• xxx.tensorflow_stats.pb
• xxx.trace.json.pb
The students should finish one of the following additional tasks:
1. Compare two or three gradient methods.
2. Compare two or three learning rate schedules.
3. Compare two or three regularization methods.
4. Compare two or three network architectures (e.g., you may compare MLP vs CNN vs RNN,
or plain CNN vs ResNet).
Please provide saved model files for successful trials. Please BRIEFLY explain what additional
task(s) you have done and the reasons the results are different (better or worse). Please
provide Matplotlib code for plotting figures if applicable.
Please note that you may use any standard API or modules (e.g. tf.keras, pytorch) to finish
the above tasks.

5. Structure of code and results
It is highly encouraged that the students organize the code and results in a systematic way so
that it is easy for the collaborators/examiners to reproduce and modify.
This section suggests one possible way to organize the code and results. Please note that the
examples given in this section are for your reference. It is not compulsory to organize your
code and results in the same format. You are welcomed to develop your own style and
preference.
Code and results
The code and results should be in different folders based on their characteristics. For example,
you may consider the following file structure: (here indention indicates the folder hierarchy)
ProjectFolder
NoteBooks
FashionMNIST.ipynb
Results
FashionMNIST
MLPBaseline
Momentum_ckpt_01
Checkpoint
momentum_0.001.ckpt.data-00000-of-00001
momentum_0.001.ckpt.index
momentum_0.001.ckpt.meta
momentum.json
Momentum_event_01
events.out.tfevents.xxx
SGD_ckpt_01
SGD_event_01
CNNModels
ConvPool_01
DataVisual
Checkpoint
Fashionmnist.png
Fashionmnist_embedding.ckpt.data-00000-of-00001
Fashionmnist_embedding.ckpt.index
Fashionmnist_embedding.ckpt.meta
Fashionmnist_label.tsv
Projector_config.pbtxt
If you put different trials/models in different folders, you may simply compare their
summaries using TensorBoard. In the example above, using tensorboard –
logdir=’ProjectFolder/Results/FashionMNIST/MLPBaseline’ will let you compare SGD and
Momentum summaries.
Jupyter Notebook
The content of cells in Jupyter Notebook should be concise, while it is recommended to put
additional definitions of functions and classes in python files (.py). For example, you may
consider the following cell structure:
In [0]: from additional_func import FLAGS
# configure input and output file folders
FLAGS.DEFAULT_IN = ‘…’
FLAGS.DEFAULT_OUT = ‘…’
In [1]: import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
In [2]: from additional_func import MLPModel
model = MLPModel(architecture = …, )
model.fit(x_train, y_train, validation_data=(x_vali,y_vali),…)
In [3]: # load trained model and test on new data
model.evaluate(x_test, y_test)

6. Marking criterion
The first project takes up 5% of the final marks of the subject. The marks are divided among
the tasks, as follows:
1. Visualizing the dataset – 0.5%
2. Train and test a classification model – 1.5%
• The model is properly defined – 0.5%
• The expected accuracy is achieved – 0.5%
• The model is saved and can be loaded – 0.5%
3. Monitor the training process – 0.5%
4. Profile the training process – 0.5%
5. Additional task – 2%
• The additional models are properly defined and trained – 1%
• Different models are properly compared, and the explanations are reasonable – 1%
Please always remember to provide appropriate and smart comments on your code.

Appendix 1 – Expected test accuracy on Fashion MNIST dataset
A reasonable test accuracy for Fashion MNIST depends on network architecture you use and
many other factors like number of epochs, batch size, learning rate, etc. Please try to achieve
a test accuracy within ±0.025 of the expected accuracy. For example, if you are using a
convolutional neural network with 2 Conv + pooling, you are expected to get an accuracy of
0.851 − 0.941. If you are unable to get such accuracy after many trials, please provide
possible reasons.
Classifier Pre-processing Test
accuracy
2 Conv + pooling None 0.876
2 Conv + pooling None 0.916
2 Conv + pooling + ELU
activation
None 0.903
2 Conv Normalization, random horizontal flip, random vertical flip,
random translation, random rotation.
0.919
2 Conv < 100K
parameters
None 0.925
2 Conv ~113K
parameters
Normalization 0.922
2 Conv + 3 FC ~1.8M
parameters
Normalization 0.932
2 Conv + 3 FC ~500K
parameters
Augmentation, batch normalization 0.934
2 Conv + pooling + BN None 0.934
2 Conv+2 FC Random Horizontal Flips 0.939
3 Conv+2 FC None 0.907
3 Conv + pooling + BN None 0.903
3 Conv + pooling + 2 FC
+ dropout
None 0.926
3 Conv + BN + pooling None 0.921
5 Conv + BN + pooling None 0.931
CNN with optional
shortcuts, dense-like
connectivity
Standardization + augmentation + random erasing 0.947
GRU + SVM None 0.888
GRU + SVM with
dropout
None 0.897
WRN40-4 8.9M params Standard pre-processing (mean/std subtraction/division)
and augmentation (random crops/horizontal flips)
0.967
Densenet-BC 768K
params
Standard pre-processing (mean/std subtraction/division)
and augmentation (random crops/horizontal flips)
0.954
Mobilenet Augmentation (horizontal flips) 0.950
Resnet18 Normalization, random horizontal flip, random vertical flip,
random translation, random rotation.
0.949
Googlenet with cross-
entropy loss
None 0.937
Alexnet with Triplet loss None 0.899
Squeezenet with cyclical
learning rate 200
epochs
None 0.900
Dual path network with
wide resnet 28-10
Standard pre-processing (mean/std subtraction/division)
and augmentation (random crops/horizontal flips)
0.957
MLP 256-128-100 None 0.8833
VGG16 26M parameters None 0.935
WRN-28-10 Standard pre-processing (mean/std subtraction/division)
and augmentation (random crops/horizontal flips)
0.959
WRN-28-10 + Random
Erasing
Standard pre-processing (mean/std subtraction/division)
and augmentation (random crops/horizontal flips)
0.963
Human Performance Crowd-sourced evaluation of human (with no fashion
expertise) performance. 1000 randomly sampled test
images, 3 labels per image, majority labelling.
0.835
Capsule Network 8M
parameters
Normalization and shift at most 2 pixel and horizontal flip 0.936
HOG+SVM HOG 0.926
Xgboost Scaling the pixel values to mean=0.0 and var=1.0 0.898
DENSER - 0.953
Dyra-Net Rescale to unit interval 0.906
Google automl 24 compute hours (higher quality) 0.939