ME41105留学生辅导、讲解Intelligent Vehicles、Matlab程序设计辅导、Matlab讲解讲解R语言编程|辅导留学

ME41105 - IV Assignment 1
Visual object detection
Intelligent Vehicles group
Delft University of Technology
November 15, 2019
About the assignment
Make the assignments in student pairs, you receive both one grade. Please read through this
whole document first such that you have an overview of what you need to do.
This assignment contains Questions and Exercises:
• You should address all of the questions in a 2 or 3 page report (excluding plots and
figures). Please provide separate answers for each Question in your report, using the
same Question number as in this document. Your answer should address all the issues
raised in the Question, but typically should not be longer than a few lines.
• The Exercises are tasks for you to do, typically implementing a function or performing
an experiment. Threrefore, first study the relevant provided code before working on an
exercise, as the code may always contains comments refering to each specific exercise.
If you do not fully understand the exercise, it may become more clear after reading the
relevant code comments! Do not directly address the exercises in your report. Instead,
you should submit your solution code together with the report. Experimental results may
be requested in accompanying questions.
You will be graded on:
1. Quality of your answers in the report: Did you answer the Questions correctly, and demonstrate
understanding of the issue at hand? All Questions are weighted equally.
2. Quality of your code: Does the code work as required?
3. Quality of presentation: Is your report readable (sentences easy to understand, no grammar
mistakes, clear figures)? Is the code you wrote clear and commented?
Submitting
• Before you start, go to the course’s Brightspace page, and enroll with your partner in a
lab group (found under the ‘Collaboration’ page).
• To submit, upload two items on the Brightspace ‘Assignments’ page:
pdf attachment A pdf with your report.
Do not forget to add your student names and ids on the report.
zip attachment A zip archive with your Matlab code for this assignment
Do NOT add the data files! They are large and we have them already ...
deadline Friday, November 29th 2019, 23:59
• Note that only a single submission is required for your group, and only your last group
submission is kept by Brightspace. You are responsible for submitting on time. So, do
not wait till the last moment to submit your work, and verify that your files were uploaded
correctly. Connection problems and forgotten attachments are not a valid excuses. The
due deadline is automatically enforced by Brightspace. If your submission is not on time,
you receive an automatic ‘1’.
• You may only hand in work by you and your lab partner, done this year. You are responsible
for not sharing your work neither publicly, nor privately with other students outside
your group. Do not put your code or report on public servers. If we believe that you have
used material from other groups, or that you have submitted material that is not yours, it
will be reported to the exam committee. This may ultimately result in a failing grade or an
expulsion.
• If code is submitted that was written with ill intent, e.g. to manipulate files in the users
home directory that were not specified by the task, you will immediately fail the course.
Getting assistance
The primary occasion to obtain help with this assignment is during the lab practicum contact
hours of the Intelligent Vehicles course. An instructor and student assistants will be present
at the practicum to give you feedback and support. If you find errors, ambiguously phrased
exercises, or have another question about this lab assignment, please use the Brightspace lab
support forum. This way, all students can benefit from the questions and answers equally. If
you cannot discuss your issue on the forum, please contact Julian Kooij (J.F.P.Kooij@tudelft.nl)
directly.
Remember that for help on what a specific Matlab command somefunction does or how
to use it, use can type from the Matlab command line help somefunction, or doc
somefunction.
Visual object detection
In this lab assignment we will study and evaluate feature extraction and pattern classification
algorithms that can be used for video-based pedestrian recognition.
Our first goal is to investigate which classifiers give the best result in distinguishing rectangular
region proposals as belonging either to the ‘pedestrian’ or ‘non-pedestrian’ class. We have a
dataset containing 3000 samples (1500 pedestrian and 1500 non-pedestrian) for training. For
testing, 1000 samples (500 pedestrian and 500 non-pedestrian) are provided, see Figure 1.
The pedestrian and non-pedestrian samples are provided in terms of three feature sets:
• data_hog.mat: Histograms of oriented Gradients (HOG) features
• data_lrf.mat: Local Receptive Field (LRF) features
• data_intensity_25x50.mat: Gray-level pixel intensity
Figure 1: Examples of pedestrian and non-pedestrian class samples in the training data.
Getting started
Before you start working on the assignments in this manual, please check the following points:
1. Download the provided assignment files from Brightspace, unzip them in a directory.
2. Read the README.txt file in the directory.
3. start Matlab, and change the path to the lab1/ subdirectory.
4. To work on the first section, open assignment eigen pedestrians.m in Matlab. Note
that other sections will refer at their start to other Matlab scripts that will guide you through
the exercises.
5. Note that the scripts use ‘Matlab code cells’, so you can run pieces of the script as needed
without having to rerun everything from the beginning (which will be slow!). If you do not
know about code cells, it is strongly recommended that you check out official cell mode
documentation1
, and watch these video tutorials: video 1 and video 2.
1htts://nl.mathworks.com/help/matlab/matlab_prog/run-sections-of-programs.html
6. The exercises often require you to make changes within a separate .m file, rather than the
top-level script. This means that inspecting the variables in these files is not easily done
without debugging. If you do not know how to debug code in matlab, then please read
matlab’s documentation on how to set breakpoints (especially the ability to automatically
set breakpoint on error), and subsequently how you can examine values once you are in
debug mode.
1 Eigen-pedestrians
Exercises refer to code sections in assignment eigen pedestrians.m.
Let us start by exploring the dataset a bit. It may be difficult to interpret the LRF and HOG
feature representations, but it is possible to visualize the gray-level intensity images by resizing
them to their original size. Note that these 25 × 50 pixel images have been reshaped to
1 × 1250-dimensional vectors. To restore one vector to its original size, the Matlab command
reshape can be used. After reshaping the matrix, it can be visualized by using imshow.
Note: imshow will automatically scale the intensities if you pass it an empty array as second
argument, e.g. imshow(I, []).
Exercise 1.1. Visualize several pedestrian and background samples from the training
data. Note: You do not need to start from scratch, a lot of the boilerplate is already given
in the provided assignment script referred to in the boxed note at the top of this section. In
the script, look for code block with the comment ’Exercise 1.1’. You will see that you only
need to complete the function imshow_intensity_features.m. For instance, this first
exercise can be solved using the correct calls to reshape and imshow only.
Some classifiers may not be able to handle the high dimensionality of the input data. Principal
Component Analysis (PCA) is one method to reduce the data dimensionality by projecting it into
a linear subspace that maintains most of the variance in the data, see Appendix A.
Question 1.1. For each of the three feature types, what is the maximum number of PCA
components? Motivate your answer.
In this section, we shall study using PCA on the gray-level intensity features. In your submitted
solution, you cannot use the built-in Matlab functions pca or princomp. Instead, make your
own implementation, for which you can use Appendix A as a reference. You are allowed to use
Matlab’s eig function to compute eigenvectors and eigenvalues. Notice that to correctly project
data onto the PCA dimensions, the mean vector should be subtracted from the data, so this
vector needs to be computed too.
Exercise 1.2. Compute the principal components from the dataset, and visualize the
mean, and the first 10 principal components as images. Note: Be aware that there are
different conventions to represent feature vectors. For instance, in math notation (as in
Appendix A), features vectors are typically expressed as column vectors. But in the code
and data matrices, the features are given as rows. This difference matters when doing dot
products between matrices and vectors.
These principal components are the eigen-vectors of the covariance computed on the given image
intensity features. Since the image dataset contains pedestrians, its principal components
can also be called eigen-pedestrians.
Question 1.2. Include images of the “mean” pedestrian, and the 10 eigen-pedestrians
in your report. How do you interpret the light/dark regions in the eigen-pedestrians? What
color would a PCA weight of ”0” have in these images? And, what color would an intensity
of ”0” have in the intensity images from Exercise 1.1?
Exercise 1.3. Project intensity data of both the pedestrian and background training
samples to the first three PCA components. After this projection, each sample will be
represented by a 3D vector in the PCA space. Create a 3D point cloud of the 3D vectors of
both classes.
Question 1.3. In the 3D plot, which axis has the largest amount of variance? Is this axis
alone sufficient to separate the two classes in our dataset? Motivate your answer.
Exercise 1.4. For intensity features, take the top-n PCA dimensions and project some 6
images from the training data onto the corresponding linear subspace, and then project the
images back to the original image space, and display them. Do this for n = 10 and n = 100.
Question 1.4. Compare the original images to those obtained after projecting to and
from the n = 10 and also to the n = 100 subspace. How does n affect the image quality?
How much (in percentage) do the PCA projections reduce the feature size compared to the
original intensity image feature?
Exercise 1.5. Now for all three feature types, make plots of the percentage of explained
variance (y-axis) vs. the number of PCA components/dimensions (x-axis).
Question 1.5. For each feature set, how many PCA dimensions should we keep to
maintain 90% of the variance in the data? Include the plots the motivate your answer.
2 Pedestrian classification
Exercises refer to code sections in assignment pedestrian classification.m.
For each of the provided feature sets (HOG, LRF, and Intensity) we want to train the following
two classifiers:
1. Linear Support Vector Machine (SVM) classifier (see Appendix B)
2. Gaussian-Mixture-Model (GMM) with Bayesian decision model (see Appendix C)
But before we train the classifiers, we have to decide if applying PCA dimensionality reduction
is necessary. To decide on the number of dimensions to use, take into account how many free
parameters each of the classifiers has, which need to be adapted during training (take into
account input parameters, their dimensionality and their properties).
Question 2.1. How many free model parameters does a trained Linear SVM classifier
have for M-dimensional feature vectors? Give a formula as a function of M, and motivate
your answer. (Note that the model parameters do not include the parameters of the training
procedure, such as number of data samples, number of iterations, or C which is used later
in the assignment).
Question 2.2. How many free model parameters does a Gaussian-Mixture-Model classifier
with K mixture components have for M-dimensional feature vectors? Give a formula
as a function of M and K, and motivate your answer.
A good rule of thumbs is that to obtain meaningful results, there should be (much) less free
parameters than training samples.
Question 2.3. For which of these classifiers is PCA dimensionality reduction necessary
on this dataset? Motivate your answer.
Now we can train each classifiers on each of the feature sets, applying PCA first where appropriate.
Afterwards, we compute the classification error (percentage of misclassified samples)
for all trained classifiers on the test samples.
Exercise 2.1. Implement the SVM classifier by completing train_SVM for training, and
evaluate_SVM for testing. See the comments in the code on available functions to train a
SVM (don’t worry about C, use C = 2). Train and test the SVM on all three feature sets.
Exercise 2.2. Implement the GMM classifier and evaluation in train_GMM and
evaluate_GMM. As you will see, train_GMM provides already boilerplate code. Use the
methods of Matlab’s built-in gmdistribution object to fit GMM distributions on the data
of each class, and to evaluate their pdf’s on test data. Train and test GMM classifiers on all
three feature sets, using K = 5 mixture components per class.
Question 2.4. What are the six classification errors that you obtained? Which feature/-
classifier combination is best?
We could also evaluate the pedestrian classifiers using ROC curves (on y-axis: true positive
rate [0,1], on x-axis: false positive rate), instead of the classification error measure. This requires
logging the decision values (classifier outputs) on the test dataset.
Exercise 2.3. Complete the code for plotting the ROC curves. Look at Appendix D and
Figure 3 for more information on the ROC plots.
Question 2.5. Which feature/classifier combination performs best? Include the ROC
plots in your report to support your answer.
Now that we have these simple classifiers, we will take a look at techniques to further improve
the performance.
Parameter selection and overfitting Both the SVM and the GMM have parameters that we
can set before training the model, and which affect the final performance. In case of the SVM,
this is the training parameter C ∈ R, and for the GMM we have K, the number of mixture
components (Actually, if we apply PCA, the number of PCA dimensions is a parameter too).
Till now we have kept these parameters fixed, but here we will experiment with optimizing them.
When we train on some training data, we should always validate performance on separate
testing data. Selecting parameters by minimizing the error on the training data is not a good
indication of performance on new data, you will see that this leads to overfitting.
Exercise 2.4. Run the code block that evaluates the SVM for various values of C on both
the training and testing data, and generate plots of the error as a function of C.
Question 2.6. If we evaluate on the training data, what is the lowest error that we can
obtain, and for which C? What is the optimal parameter and error if we evaluate on the
test data? Setting C too large leads to overfitting. But why does a large C decrease the
training error, but increase the test error?
Exercise 2.5. Now implement your own code block to evaluate the effect of changing K
in the GMM classifier on the HOG features. Try out these values for K ranging from 1 up to
7, and create again plots of the error as a function of K for evaluating on the training and
test data. You can copy and alter the provided code from the previous exercise.
Question 2.7. If we evaluate on the training data, what is the lowest error that we can
obtain, and for which K? What is the optimal parameter and error if we evaluate on the test
data? Include the error plots in your report. Why does overfitting occur as we increase K?
Note: For the next exercise, you can keep using C = 2 for the SVM, and K = 5 for the GMM.
Multi-feature classification Multiple classifiers trained on distinct feature sets might provide
complementary results. Intuitively, we should be able to benefit from having multiple distinct
’expert opinions’. Here we consider two approaches to fuse the decision values (classifier
output) of the trained SVM classifiers on the HOG and LRF features, namely
1. fused output is the mean of the outputs of HOG/SVM and LRF/SVM
2. fused output is the maximum of the outputs of HOG/SVM and LRF/SVM.
Exercise 2.6. Evaluate both fusion approaches using ROC curves.
Question 2.8. Which fusion approach performs best? List the respective ROCs and
discuss the effects you observe in your report.
3 Pedestrian detection
Exercises refer to code sections in assignment pedestrian detection.m.
Up to now, we have looked at pedestrian classification, i.e. deciding for a provided region
proposal if it belongs to the pedestrian or non-pedestrian class. In this last assignment, we will
move to pedestrian detection, which considers a broader question: In a given a target image,
where are pedestrians located? A fairly simple but effective method to answer this question
is to just divide the image into a large set of candidate region proposals of the right size and
shape, and classify each of these regions using a trained pedestrian classifier. See Figure 2
for an example.
You are now provided with a new dataset with pedestrian and non-pedestrian samples, and
computed HOG features, and also with a test video sequence of a pedestrian filmed from a
moving vehicle. Furthermore, the region proposals have been pre-computed, and HOG features
for each region are provided per video frame.
Figure 2: Pedestrian detection by classifying many region proposals. In this example, the green
rectangles correspond to regions classified as pedestrians. Our aim is to avoid false positives
and false negatives, though in practice misclassification errors may occur.
Exercise 3.1. Train and evaluate a linear SVM on the HOG features as good as you can.
Question 3.1. Explain your approach: Why did you (not) need to use dimensionality
reduction? What value do you use for C and why?
Exercise 3.2. Train and evaluate a GMM on the HOG features as good as you can.
Question 3.2. Explain your approach: Why did you (not) need to use dimensionality
reduction? What value do you use for K and why?
Question 3.3. Which of your classifiers is the best to use on this data? Include ROC
plots to support your decision.
Exercise 3.3. Now apply your selected classifier on the region proposals of the video
sequence, and visualize the regions which are considered ‘pedestrian’.
Question 3.4. Studying the pedestrian detection results qualitatively (so by looking at
the results, as opposed to quantitative evaluation using error statistics and ROC curves).
What kind of false positives and false negatives do you observe? What would you suggest
to counter these errors, and improve the results?
Acknowledgements
We thank Markus Enzweiler for his help in creating part of the data and early versions of theAppendix
A Principal Component Analysis (PCA)
PCA is a technique for reducing the dimensionality of the feature space. By analyzing how
the data is distributed from training samples, PCA computes a linear subspace that maintains
most of the variance of the input data. New data samples can later also be projected to this
subspace, which is also sometimes referred to as the ‘PCA subspace’, or ‘PCA projection’.
A.1 Computing the subspace
Consider that we have N data samples x1, · · · , xN in an M-dimensional feature space, given
as a single M × N data matrix X with each column a data sample. We can compute a Ddimensional
PCA subspace of X where D ≤ M to obtain a D dimensional data representation
which maintains most of the variance. The subspace will be defined as a linear projection,
consisting of an M × D transformation matrix W, and the M-dimensional mean data vector m.
First compute the mean data vector m,(1)The mean is then subtracted from the data to get the zero-mean M × N data matrix X (with columns xj ), which is then used to compute the M × M covariance matrix C,
xj = xj − m, ∀j, where 1 ≤ j ≤ N (2)
C = X × X>. (3)Next, compute the eigen-vectors wi and corresponding eigen-values λi of the covariance matrix
C. Recall that the eigen-vector and eigen-values fulfill the following property,
Cwi = wiλi. (4)The eigen-vectors should be sorted such that the i-th component is the eigen-vector with the
i-th largest eigen-value λi
, hence the first eigen-vector w1 has the largest eigen-value λ1.
These vectors M-dimensional wi are called the principal components of the data X, and are
all orthonormal to each other.
The eigen-value λi
is proportional to the amount of variance the data has along PCA subspace
dimension i, hence fraction of variance kept by principal component i is λi/
PM
j=1 λj . Therefore,
we only keep the first D eigen-vectors w1, · · · , wD, as they correspond the D dimensions that
retain most of the data variance. These D eigenvectors can be represented as single M × D
matrix W, where the i-th column is the i-th component wi.
A.2 Projecting data to the subspace
Once the subpace is computed, we can apply the transformation to reduce the dimensionality
of any M-dimensional data vector x ∈ RM to its reduced D-dimensional ‘PCA’ representation
11x ∈ R
D using the following linear equation (assuming all vectors are column vectors),
x = W>(x − m). (5)
A.3 Back-projecting from the subspace
The inverse back-projection can also easily be achieved,
x0 = Wx + m (6)
such that x0
is the (approximate) reconstruction in the original M-dimensional feature space. If
we keep all dimensions in the projection, such that D = M, then W×W> = I. In other words,
W is an orthonormal projection and therefore its transpose is its inverse W> = W−1
. However,
typically we use D M so back-projection does not restore the original feature space exactly.
B Linear Support Vector Machine (SVM) classifier
Support Vector Machines are an advanced topic in Machine Learning. In this session, we will
stick to Linear SVMs on two-class data, which form the basis of more complicated approaches.
A Linear SVM learns a hyperplane in the feature space that separates the data points of each
class with maximal margin. The model parameters of the hyperplane are the M-dimensional
weight vector w and a bias b which define the normal and offset of the plane. To test a new
M-dimensional feature vector x, the Linear SVM computes the following decision value:
d(x) = w> · x + b (7)
The sign of this decision value determines the assigned class label, i.e. either it is assigned to
the positive (i.e. pedestrian) class, d(x) ≥ 0, or to the negative (non-pedestrian) class, d(x) < 0.
For learning Linear SVM parameters, the built-in fitcsvm Matlab function can be used. In
case that you have an older Matlab version, you can also use the provided primal_svm.m
function, which should let you obtain similar conclusions in the lab assignments.
Note that there is a training parameter C ∈ R (called BoxConstraint for fitsvm) which influences
how the optimizer computes the linear decision boundary: For large C, the optimizer
tries very hard to separate both classes, which means that its boundary is sensitive to outliers.
For low C it results in a soft margin where a few samples may lie close to or on the wrong side
of the boundary, if this enables a wider margin to separate most data points of both classes.
While C is used during training, it is not part of the learned model in Equation (7).
C Gaussian-Mixture-Model (GMM) classifier
The Gaussian-Mixture-Model classifier belongs to a family of classifiers that model how the
data from each class is distributed. In the training phase, these class conditional distributions
are fitted on the available training data.
For a new test sample x, the likelihood P(x|c) of the sample belonging to class c is evaluated
for each class. Then, Bayes’ rule can be applied to combine these likelihoods with the class
priors P(c) (i.e. how likely is each class before observing the feature) to obtained the class
posterior distribution P(c|x) (i.e. how likely is each class after observing the feature),
P(c|x) = P(x|c)P(c)
P
c P(x|c)P(c)
. (Bayes’ rule) (8)
We then assign to x the class label with highest posterior probability, which is also called the
maximum a-posteriori solution. In our two-class ‘pedestrian’ vs ’non-pedestrian’ problem, we
classify x as pedestrian if P(c = pedestrian|x) ≥12. In the GMM the decision value is therefore
d(x) = P(c = pedestrian|x), with the decision threshold 12.
In case of the GMM classifier, the distributions P(x|c) of each class are modeled by a weighted
mixture of K Multivariate Normal (i.e. ‘Gaussian’) distributions, i.e.
c are the mean and covariance of the i-th mixture component for class c,
and w(i)
c the component’s weight. Fitting such a distribution on training samples is typically
done using the Expectation-Maximization (EM) algorithm which iterates between optimizing
the weights, and optimizing the K Normal distributions.
In this course, we will not investigate the EM algorithm, and you are not required to know how
it works. For training a Gaussian-Mixture-Model, the built-in gmdistribution class in the
Matlab Statistical Toolbox can be used, which uses EM internally.
D Receiver Operating Characteristic (ROC) curves
Instead of reporting a single test error value for a fixed decision threshold, one can also consider
changing the threshold to trade-off different types of classification error. Lowering the
threshold results in more test cases being assigned to the positive (e.g. pedestrian) class,
while increasing it results in more samples classified negatively (e.g. non-pedestrian). This
trade-off is then reflected in the False Positive Rate (FPR) and True Positive Rate (TPR), both
of which are numbers between 0 and 1.
Let di be the classifier’s decision value for test sample i, and yi ∈ {−1, +1} the true class
label of the sample. Also, let the function counti
[x(i)] count the number of samples for which
condition x holds. Then the TPR and FPR for a given threshold τ are expressed as,
P = counti
[yi ≥ 0] number of samples in positive class (10)
N = counti
[yi < 0] number of samples in negative class (11)
T P(τ ) = counti
[(di ≥ τ ) ∧ (yi ≥ 0)] number of positive samples classified as positive (12)
F P(τ ) = counti
[(di ≥ τ ) ∧ (yi < 0)] number of negative samples classified as positive (13)
T P R(τ ) = T P(τ )P
F P R(τ ) = F P(τ )N
true and false positive rates. (14)
By computing the TPR and FPR for varying thresholds, one can create a so-called ReceiverOperating
Characteristic (ROC) curve. Figure 3 shows an example of an ROC plot for a single
classifier. The ROC curve always starts at (0,0), which corresponds to setting the threshold so
high that all test samples are assigned to the negative class, hence there would be no false
positives, but also no true positives. The other extreme is obtained when the threshold is so
0 0.2 0.4 0.6 0.8 1
False Positive Rate
True Positive Rate
ROC
Figure 3: Example of an ROC curve for a certain classifier. We can see for instance that we
obtain a True Positive Rate of about 70% (e.g. actual pedestrians classified as pedestrians) at
a False Positive Rate of 20% (e.g. actual non-pedestrians classified as pedestrians).
low that everything is assigned to the positive class, in which case both FPR and TPR become
one as the curves reaches the top-right corner at (1,1). The curve of an ideal classifier would
touch the top-left corner, corresponding to TPR of 1 for a FPR of 0.