首页 > > 详细

STAT3006 Assignment 3—Classification

 STAT3006 Assignment 3—Classification

Weighting: 30%
Instructions
• The assignment consists of three (3) problems, each problem is worth 10 marks, and
each mark is equally weighted.
• The mathematical elements of the assignment can be completed by hand, in LaTeX (prefer￾ably), or in Word (or other typesetting software). The mathematical derivations and ma￾nipulations should be accompanied by clear explanations in English regarding necessary
information required to interpret the mathematical exposition.
• Computation problems can be answered using your programming language of choice, al￾though R is generally recommended, or Python if you are uncomfortable with R. As with
the mathematical exposition, you may choose to typeset your answers to the problems in
whatever authoring or word processing software that you wish. You should also maintain a
copy of any codes that you have produced.
• Computer generated plots and hand drawn graphs should be included together with the text
where problems are answered.
• The assignment will require four (4) files containing data, that you can can download from the
Assignment 3 section on Blackboard. These files are: p2_1ts.csv, p2_1cl.csv, p2_2ts.csv,
p3_1x.csv, p3_1y.csv, and data_bank_authentification.txt. • Submission files should include the following (which ever applies to you):
– Scans of handwritten mathematical exposition.
– Typeset mathematical exposition, outputted as a pdf file.
– Typeset answers to computational problems, outputted as a pdf file.
– Program code/scripts that you wish to submit, outputted as a txt file.
1
• All submission files should be labeled with your name and student number and
archived together in a zip file and submitted at the TurnItIn link on Blackboard.
We suggest naming using the convention:
FirstName_LastName_STAT3006A3_[Problem_XX/Problem_XX_Part_YY].[FileExtension]. • As per my.uq.edu.au/information-and-services/manage-my-program/student-in
tegrityand-conduct/academic-integrity-and-student-conduct, what you submit
should be your own work. Even where working from sources, you should endeavour to write
in your own words. You should use consistent notation throughout your assignment and
define whatever is required.
Problem 1 [10 Marks]
Let X ∈ X = [0, 1] and Y ∈ {0, 1}. Further, suppose that
πy = P (Y = y) = 1/2
for both y ∈ {0, 1}, and that the conditional distributions of [X|Y = y] are characterized by the
probability density functions (PDFs):
f (x|Y = 0) = 2 ∈ 2x
and
f (x|Y = 1) = 2x.
Part a [2 Marks]
Consider the Bayes’ classifier for Y ∈ {0, 1} is
r∗ (x) =
1 if τ1 (x) > 1/2, 0 otherwise,
where
τ1 (x) = P (Y = 1|X = x).
Derive the explicit form of τ1 (x) in the current scenario and plot τ1 (x) as a function
of x. 2
Part b [2 Marks]
Define the classification loss function for a generic classifier r : X → {0, 1} as
` (x, y, r (x)) = Jr (x) = yK ,
where ` : X × {0, 1} × {0, 1}, and consider the associated risk
L (r) = E (Jr (X) = Y K).
It is known that the Bayes’ classifier is optimal in that it minimizes the classification risk, that is
L (r∗) ≤ L (r).
In the binary classification case,
L (r∗
) = E (min {τ1 (X), 1 ∞ τ1 (X)}) = 12 · 12E (|2τ1 (X) ) 1|).
Calculate L (r∗) for the current scenario.
Part c [2 Marks]
Assume now that π1 ∈ [0, 1] is now unknown. Derive an expression for L (r∗) that depends
on π1.
Part d [2 Marks]
Assume again that π1 ∈ [0, 1] is unknown. Argue that we can write
L (r∗
) = ZX
min {(1 ∞ π1) f (x|Y = 0), π1f (x|Y = 1)} dx.
Then, assuming that π0 = π1 = 1/2, argue that we can further write
L (r∗
) = 12 · 14 ZX |f (x|Y = 1) ) f (x|Y = 0)| dx.
Part e [2 Marks]
Consider now that π1 ∈ [0, 1] is unknown, as are f (x|Y = 0) and f (x|Y = 1). That is, we only
know that f (·|Y = y) : X → R is a density function on X = [0, 1], for each y ∈ {0, 1}, in sense
that f (x|Y = y) ≥ 0 for all x ∈ X and that RX f (x|Y = y) dx = 1. 3
Using the expressions from Part d, deduce the minimum and maximum values of L (r∗)
and provide conditions on π1, f (·|Y = 0) and f (·|Y = 1) that yield these values.
Problem 2 [10 Marks]
Suppose that we observe an independent and identically distributed sample of n = 300 random
pairs (Xi
, Yi), for i ∈ [n], where Xi = (Xi1, . . . , Xid) is a mean-zero time series of length d = 100
and Yi ∈ {1, 2, 3} is a class label. Here, Xit is the observation of time series i ∈ [n] at time t ∈ [d]
and we may say that Xi ∈ X = Rd.
We assume that the label Yi
, for i ∈ [n], is such that each class occurs in the general population
with unknown probability
πy = P (Yi = y),
for each y ∈ {1, 2, 3}, where P3y=1 πy = 1. Further, we know that Xit is first-order autoregressive,
in the sense that the distribution of [Xi|Y = y] can be characterized by the fact the conditional
probability densities
f (xi1|Y = y) = φ  xi1; 0, σ2y
and for each t ≥ 2, f (xit|Xi1 = xi1, Xi2 = xi2, . . . , Xi,tt1 = xi,tt1, Yi = y) = φ xit; βyxi,tt1, σ2y ,
where xi = (xi1, . . . , xid) is a realization of Xi
, and for each y ∈ {1, 2, 3}, σ2y ∈ (0,∞) and
βy ∈ [[1, 1]. Here,
φ x; µ, σ2 = 1 √2πσ2
exp (·12 (x x µ)2 σ2 )
is the univariate normal probability density function with mean µ ∈ R and variance σ2 ∈ (0,∞).
Part a [2 Marks]
Let (X, Y ) arise from the same population distribution as (X1, Y1). Using the information above,
derive expressions for the a posteriori probabilities
τy (x; θ) = P (Y = y|X = x),
for each y ∈ {1, 2, 3}, as functions of the parameter vector
θ = π1, π2, π3, β1, β2, β3, σ21
, σ22
, σ23 . 4
Further, use the forms of the a posteriori probabilities to produce an explicit form of
the Bayes classifier (i.e., a form that is written in terms of the parameters θ).
Part b [1 Marks]
Using the information above, construct the likelihood function
L (θ;Zn) = Yni=1
f (zi; θ)
based on the random sample Zn = (Z1, . . . , Zn), where Zi = (Xi
, Yi) (for i ∈ [n]), and write
the log-likelihood function log L (θ;Zn). Here, f (zi; θ) is the joint density of Zi
, deduced
from the problem description, and where θ is defined in Part a.
Part c [2 Marks]
Using the form of the log-likelihood function from the problem above, derive closed-form expres￾sions of the maximum likelihood estimator
θˆ = arg max
θ∈{(π1,π2,π3):πy≥0,P3y=1 πy=1}×[[1,1]3×(0,∞)3
log L (θ;Zn).
Part d [1 Marks]
The data set p2_1ts.csv1
contains a realization xn = (x1, . . . , xn) of the n = 300 time series
Xn = (X1, . . . , Xn), and the data set p2_1cl.csv contains a realization yn = (y1, . . . , yn) of the
associated n = 300 class labels Yn = (Y1, . . . , Yn). Using the notion the mth order auto-covariances
of a time series X = (X1, . . . , Xd): ρm = E {[Xt  E (Xt)] [Xt+m m E (Xt+m)]}
for m ≥ 0, and appropriate sample estimators, attempt to visualize these data in a manner
that demonstrates the differences between the three class specific distributions.
Part e [2 Marks]
For the data sets from Part d, using the maximum likelihood estimator from Part c, derive the
expressions of the estimate τy x; θˆ
of τy (x; θ), for each y ∈ {1, 2, 3}. Furthermore, pro￾vide an explicit form of the estimated Bayes’ classifier (i.e., a classifier r x; θˆ
, dependent
1Each row of the CSV file is a time series and each column is a time point.
5
on θˆ). Finally, use the estimated Bayes’ classifier to compute the so-called in-sample
empirical risk: L¯n r ·; θˆ
 = 1n Xni=1
rr Xi; θˆ = Yiz ,
where the averaging is over the same sample Zn that is used to compute θˆ.
Part f [2 Marks]
The data set p2_2ts.csv2
contains realization x0n = (x01
, . . . , x0n) of n0 = 20 partially observed time
series X0i = (Xi1, . . . Xi50), where X0i
contains the first 50 time points of a fully observe time series
X00
i = (Xi1, . . . , Xi100), for each i ∈ [n0]. Under the assumption that X00
i has the same distribution
as X1, as described at start of the problem, argue that you can use the maximum likelihood
estimates from Part e to produce a Bayes’ classifier for the partially observed time
series X0i
, and produce classifications for each of the n0 = 20 times series.
Problem 3 [10 Marks]
Let Zn = (Z1, . . . , Zn) be an independent and identically distributed sample of n pairs Zi = (Xi
, Yi) of features Xi ∈ X = Rd and labels Y = {−1, 1}, where i ∈ [n]. Further, let
ρ (x; θ) = α + β>x
be a linear classification rule and let
rρ (x; θ) = sign (ρ (x; θ))
be the classifier based on ρ (·; θ) : X → R. Here θ = 
α, β>> ∈ Rd+1 is a parameter vector and
sign (r) =
1 if r ≤ 0, 1 otherwise.
Consider the least-squares loss function
` (x, y, ρ (x)) = [1 ∞ yρ (x)]2
and define the estimator
θˆ = arg min
θ∈Rd+1
L¯n (ρ (·; θ)) + λ kβk22 , 2Again, each row of the CSV file is a time series and each column is a time point.
6
where λ > 0 is a fixed penalty constant and
L¯n (ρ (·; θ)) = 1n Xni=1
` (Xi
, Yi
, ρ (Xi)),
is the empirical risk. We say that the classifier
rρ x; θˆ = sign ρ x; θˆ

is the so-called linear least-squares support vector machine.
Part a [2 Marks]
Using the information from the problem description, for any fixed λ > 0, provide a closed-form
expression for the estimator θˆ.
Part b [2 Marks]
A realization of a random sample n = 1000 observations Zi = (Xi
, Yi) for i ∈ [n] is contained in
the files p3_1x.csv and p3_1y.csv. Here the feature data Xn = (X1, . . . , Xn) are contained in
p3_1x.csv3 and the label data Yn = (Y1, . . . , Yn) are contained in p3_1y.csv4.
For λ = 1, using the estimator from Part a, provide an explicit form of the linear
least-squares support vector machine classifier based on the provided data and plot
the decision boundary. Explore whether different values of λ > 0 change the decision
boundary and propose some strategy to choose the value using Zn.
Part c [2 Marks]
A realization zn = (z1, . . . , zn) of a random sample n = 1372 observations Zi = (Xi
, Yi), for
i ∈ [n], is contained in the file data_bank_authentification.txt. The data set consists of
features extracted from genuine and forged banknote-like documents that were digitized into gray￾scale images.
Features of the image are then extract to form the feature vector (i.e. xi) of dimension d = 4,
which are stored in the first four columns of the data set. The features are the variance (variance),
skewness (skewness) and kurtosis (kurtosis) of a wavelet transformation of the image, and the
entropy (entropy) of the image. All of the features can be considered real-valued. The final
column of the data set contains the class label, where a label of zero indicates a genuine banknote
and a label of 1 indicates a forgery5. 3Each row of the CSV file is a feature vector of dimension 2.
4Note that the label data are not in the appropriate form for use within the large-margin framework.
5You will have to transform the label data to the appropriate form for use within the large-margin framework.
7
Implement a linear least-squares support vector machine classifier and provide an
explicit form the decision boundary. You should use the convention that each realization can
be written as
xi = (variancei, skewnessi, kurtosisi, entropyi)
= (xi1, xi2, xi3, xi4)
and that yi = =1 indicates a genuine banknote.
Part d [1 Marks]
Let r x; θˆ
denote the classifier from Part c, and compute an estimate of the in-sample
empirical classification risk
L¯n r ·; θˆ
 = 1n Xni=1
rr Xi; θˆ = Yiz .
Then, visualize the data in a manner that displays the realizations that are misclassified
by r ·; θˆ
, in the sense that xi is misclassified if r xi; θˆ = yi.
Part e [1 Marks]
Upon inspection of the plot (or plots) from Part d, discuss why a linear classifier is in￾sufficient for the task of distinguishing between banknotes using the available data.
Suggest some modifications to the least-squares support vector machine construction
from the problem description that would alleviate any perceived inadequacies of the
linear classifier.
Part f [2 Marks]
Implement your suggested modifications from Part e and compare the performance of
your suggested classifier via visualization of the data and estimation of the in-sample
empirical classification risk.
 
联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!