STAT3006 Assignment 3 Classification

STAT3006 Assignment 3, 2022

Classification

Weighting 30% - due 18/10/2022

This assignment involves constructing and assessing classifiers. We will first consider the

now familiar Iris dataset, collected by Anderson (1936) and first statistically analysed by

Fisher (1936), which requires one to solve the species problem, that is to predict which

(known) species a given specimen belongs to.

The second dataset to be used to train classifiers is the Modified National Institute of

Standards and Technology (MNIST) database of handwritten digits. This contains 70,000

images, of which 10,000 were reserved for testing. In each case you will use the given

labelled data and attempt to construct classifiers which can accurately classify unlabelled

observations.

You should select four classifiers, with at least one classifier based on a probability model,

and one which is not, with each preferably mentioned in this course. Classifiers discussed in

the course which are based on a probability model include linear, quadratic, mixture and

kernel density discriminant analysis. Classifiers discussed which are not based on a

probability model include k nearest neighbours, classification trees and support vector

machines. All of these are implemented via various packages in R. If you wish to use a

different method, please check with the lecturer. You cannot use a (classifier, dataset)

combination that you have used or are using for an assignment in another course.

Tasks:

1. Assume that we will perform linear discriminant analysis for three classes in two

dimensions. Additionally assume that we know the class proportions are equal, that common

covariance matrix is known to be Σ = � 𝜎𝜎1

2 𝜌𝜌 𝜎𝜎1𝜎𝜎2

𝜌𝜌 𝜎𝜎1𝜎𝜎2 𝜎𝜎2

2 � and that the class means are also

known to be 𝜇𝜇𝐴𝐴 = �

𝜇𝜇

𝐴𝐴1

𝐴𝐴2�, 𝜇𝜇𝐵𝐵 = �

𝜇𝜇

𝐵𝐵1

𝐵𝐵2� and 𝜇𝜇𝐶𝐶 = �

𝜇𝜇𝐶𝐶1

𝜇𝜇𝐶𝐶2�, respectively, where all the mean

scalar parameters are real-valued, the standard deviations are positive and the correlation 𝜌𝜌 ∈

[−1,1].

(a) Prove that the decision boundaries between the pairs of classes meet at a single point

under most conditions, and state those conditions. [4 marks]

(b) Determine this point in terms of the given parameters. [1 mark]

2. Apply one probability-based and one non-probability-based classifier to the Iris dataset

(with all 4 predictor dimensions) using R, report the results and interpret them.

Results for each classifier should include the following:

(a) Characterisation of each class as modeled by the classifier. Note that, where possible, this

should include parameter estimates for each class. Where not possible, another attempt

should be made to characterise the class, such as via summaries of the observations which the

classifier puts into each class. [1 mark]

(b) Cross-validation (CV)-based estimates of the overall and class-specific error rates:

obtained by training the classifier on a large fraction of the whole dataset and then applying it

to the remaining data and checking error rates. You may use 5-fold, 10-fold or leave-one-out

cross-validation to estimate performance, but you should give a statistical reason for your

choice. Also include an approximate 95% confidence interval for each error rate, along with a

description of how this was obtained. [2 marks]

mark]

(d) Plots of the predicted classes as they apply to the data and the data space, including visual

representation of the decision boundaries, covering all unique pairs of explanatory variables.

Note: you do not need to derive these boundaries – they can emerge from a plot. [1 mark]

(e) Compare and contrast the decision boundaries between classes produced by the two

methods and try to explain their shapes. Which method do you think was best for this

dataset? Explain. Describe some aspects of either method that you think are appropriate or

inappropriate for this classification problem. [2 marks]

(f) You are now asked to predict the class of new observations collected from an area where

the class proportions have changed to 0.5, 0.2 and 0.3 for setosa, virginica, and versicolor,

respectively.

Describe (with mathematical details) how you would change or refit each classifier to give it

the lowest possible expected cost of prediction under these circumstances. Change or refit the

classifiers as necessary to do this and report point estimates of the new classifier parameters,

or class characterisation, as in (a). Explain the nature of the changes. [3 marks]

Note: we will not compare our classification results with those of Fisher’s 1936 paper “The

Use of Multiple Measurements in Taxonomic Problems”. However, this paper is worth

reading that paper for background on the dataset and some of the aims of its analysis.

3. Choose two methods of classification that you have not used on the Iris dataset and apply

them to the MNIST dataset – see http://yann.lecun.com/exdb/mnist/ . Leave the train and test

split as it is, but feel free to use some of the training data to help choose a model, if desired.

Aim for best possible predictive performance, but view this as primarily a learning exercise.

I.e. you do not need to choose methods with the best performance. However, you should aim

to get reasonable performance out of any method chosen, e.g. with reasonable choice of any

hyperparameters (say 20% error rate or less). You should not pre-process the data in a way

which makes use of any knowledge you have of the digit recognition problem. I.e. don’t try

to produce new explanatory variables which represent image features, even though this would

likely help performance. You can use dimension reduction if you wish (e.g. PCA).

(a) Give a brief introduction to the dataset, including quantitative aspects. [1 mark]

(b) Give a summary of the predictive performance on the test set for each classifier. Make

sure you do not use the test set at all before doing this. Include at least estimated overall error

rate and class-specific error rates, along with approximate 95% confidence intervals for these.

[2 marks]

explain any differences between the error rates estimated from the training and test sets. Note

that reference to training and test sets here are to the labelling of the original data, not to how

you may have used them. [2 marks]

(d) Explain why you chose each classifier type and describe some of their apparent strengths

and weaknesses for this problem. [2 marks]

(e) For each classifier, show 1 example per possible digit (i.e. 20 in total) of handwritten

digits which were classified into the correct class with the most certainty, and quantify what

you mean by certainty. Explain why you think the classifiers were particularly successful at

classifying these correctly and with certainty. Note: defining and implementing “certainty”

may take some thought, creativity, maths and coding. [3 marks]

(f) For each classifier, show 1 example per digit (i.e. 20 in total) of the worst errors made by

your classifier and quantify what you mean by worst. Explain why you think some of these

errors may have been made by your classifier and been among the worst seen. [2 marks]

(g) What is the difference between a handwritten 7 and a 1 according to each classifier? Try

to explain what each classifier is doing in this case, i.e. what are the main things the classifier

considers to make this decision and how are they used? [3 marks]

Notes:

(i) Some R commands you might find useful:

objects() – gives the current list of objects in memory.

attributes(x) – gives the set of attributes of an object x.

(ii) For the Iris dataset, we will assume that each species was collected from an environment

where all three are equally likely to be selected in a random sample. We can view the sample

as representative and the prevalence of each species is similar in some environments. (See

section VI of Fisher, 1936 for some details on how the observations were collected.)

(iii) Make it a habit to give reasons or justifications for decisions or statements.

(iv) Please put all the R commands in a separate file or files and submit these separately via a

single text file or a zip file. You should not give any R commands in your main report and

should not include any raw output – i.e. just include figures from R (each with a title, axis

labels and caption below) and put any relevant numerical output in a table or within the text.

(v) Please name your files something like student_number_STAT3006_A3.pdf to assist with

marking.

(vi) As per http://www.uq.edu.au/myadvisor/academic-integrity-and-plagiarism, what you

submit should be your own work. Even where working from sources, you should endeavour

to write in your own words. Use consistent notation throughout your assignment and define

all of it.

(vii) Some references

Maindonald, J. and Braun, J. Data Analysis and Graphics Using R - An Example-Based

Approach, 3rd edition, Cambridge University Press, 2010.

Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, Fourth Edition, Springer,

2002.

Wickham, H. and Grolemund, G. R for Data Science, O'Reilly, 2017.

Classification and Clustering:

Bishop, C. Pattern Recognition & Machine Learning, Springer, 2006.

Devroye, L., Gyorfi, L. and Lugosi, G., A Probabilistic Theory of Pattern Recognition,

Springer, 1996.

Duda, R.O., Hart, P.E. and Stork, D.G., Pattern Classification, Wiley, 2001.

Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep Learning, MIT Press

Hardle, W.K. and Simar, L., Applied Multivariate Statistical Analysis, 4th ed., Springer,

2015.

Hastie, T. and Tibshirani, R. Discriminant analyses by Gaussian mixtures, Journal of the

Royal Statistical Society B, 8, 155-176. (MDA paper), 1996.

Hastie, T. and Tibshirani, R. and Friedman, J. The Elements of Statistical Learning: Data

Mining, Inference, and Prediction, 2nd edition, Springer, 2009.

McLachlan, G.J. and Peel, D. Finite Mixture Models, Wiley, 2000.

McLachlan, G.J. Discriminant Analysis and Statistical Pattern Recognition, Wiley, 1992.

Scholkopf, B. and Smola, A. J. Learning with Kernels, MIT Press, 2001.

Data Sources:

Anderson, E. The species problem in Iris, Annals of the Missouri Botanical Garden 23 (3):

457–509, 1936.

Fisher, R.A. The use of multiple measurements in taxonomic problems, Annals of Eugenics 7

(2): 179–188, 1936.

LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. Gradient-Based Learning Applied to

Document Recognition, Proceedings of the IEEE, 86, 2278-2324, 1998.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

辅导 poetry portfolio辅导 c/... 2025-07-17
辅导 find all primes p such ... 2025-07-17
讲解 asia 111 introduction t... 2025-07-17
讲解 summer reading assignme... 2025-07-17
讲解 change management simul... 2025-07-17
讲解 cpt206 computer program... 2025-07-17
讲解 stat 3025q statistical ... 2025-07-17
辅导 18-698 / 42-632 neural ... 2025-07-17
辅导 computer fundamentals /... 2025-07-17
讲解 bu1002: assessment task... 2025-07-17
辅导 mgmt321 organisations a... 2025-07-17
5cce2mct辅导、讲解 java/pyt... 2025-07-17
讲解 data编程、python程序设计... 2025-07-17
辅导 ced 6910 01 (70700) cap... 2025-07-17
讲解 methods for calculating... 2025-07-17
讲解 researching everyday co... 2025-07-17
讲解 math 2210 q, applied li... 2025-07-17
辅导 mgmt321 organisations a... 2025-07-17
讲解 comp2912 software desig... 2025-07-17
讲解 comp3010 collaborative ... 2025-07-17

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！