首页 > > 详细

讲解COMP7703调试Python

COMP7703 - Machine Learning 
Take Home Exam 
(Worth 10% of total marks for the course) 
Marcus Gallagher 
May 2020 
1 Instructions 
Complete all questions. Submit your answers in Blackboard using the test item 
in the Assessment Section. You can use whatever resources you wish to help 
you complete the questions (e.g. Matlab/python, books, web), but you are 
strongly encouraged to complete the exam individually rather than discussing 
or working with other students. It is up to you to decide how much time you 
spend on this task, but the intention is that it will not take more than approx. 
4hrs to complete. Give your answers correct to 5 significant digits. 
2 Questions 
Questions 1-5 use the sonarmini.csv dataset (available on the course black- 
board site. 
1. Calculate the absolute difference between the biased (Maximum Likeli- 
hood Estimator) and unbiased estimators of the sample variance of the 
second column of the dataset. 
2. Using the Manhattan distance, which data points are the 3 nearest-neighbours 
of the 5th point (i.e. row 5) in the dataset? (Use only the first two columns 
for this question; the third column is a class label). 
3. Let xi be the ith feature/column in the dataset. Consider the classification 
rule: “Output 1 if x2 < 0.05, else output 0”. How many points in the 
dataset are classified correctly with this rule? 
4. Consider fitting a 2D histogram to columns 1 and 2 of the dataset: 
• three bins of equal width in each dimension (i.e. 9 bins) 
• bins span the range [0, 0.15] for x1 and [0, 0.24] for x2. 
Which bin has the greatest height? 
5. Consider constructing a dendrogram of rows 6-10 of this data, using single- 
link clustering. Ignore the third column and use Euclidean distance. On 
the first iteration, points/rows/groups 6 and 9 are merged into a group. 
One the second iteration, which two groups are merged? Give the row 
numbers of all the points in the newly merged group as your answer. 
6. Consider a Gaussian mixture model 
p(x|θ) = 1 
(1, 2)′, 
2 0 
0 0.5 
)) 
(−3,−5)′, 
1 0 
0 1 
)) 
(0, 1)′, 
0.6 0.5 
0.5 1.6 
)) 
Calculate the probability density of the model at the point x′ = (1.2, 1.2). 
7. Find the Mahalanobis distance1 from the point x′ = (1, 1) to the Gaussian 
(0, 1)′, 
0.6 0.5 
0.5 1.6 
)) 
8. The volume of the hyperellipsoid corresponding to a Mahalanobis distance 
r is given by: 
V = Vd|Σ|1/2rd 
where Vd is the volume of a d-dimensional unit hypersphere: 
Vd = 
pid/2/(d/2)!, d even 
2dpi(d−1)/2(d−12 )!/d!, d odd. 
Calculate this volume with r = 6 and the Gaussian: 
(0, 0, 0, 0)′, 
 
0.6 0.5 0.5 0.5 
0.5 1.6 0.5 0.5 
0.5 0.5 0.6 0.5 
0.5 0.5 0.5 1.6 
 
 
9. It is often useful to be able to compute things “on-line” (i.e. recursively) 
with respect to a dataset (meaning, e.g. that the entire dataset does not 
need to be stored in memory). The sample mean of a scalar variable, x 
can be calculated in this way using: 
µˆn+1 = µˆn + 
n+ 1 
(xn+1 − µˆn) 
If we have previously observed 10 data points and the sample mean is 5.0, 
what would the next observation need to be to change the sample mean 
to equal 6.0? 
1The Mahalanobis distance is defined as: D = 
√ 
((x− µ)′S−1(x− µ)) (see, for example 
Wikipedia or the [DHS] book). Unfortunately, however, the Alpaydin book gives the impres- 
sion that it is defined without the square root. Wikipedia calles this the “generalized squared 
interpoint distance” and provides a citation. 
10. Consider carrying out gradient descent on the function f(x) = x4. Start- 
ing from an initial position x = 2 with a step size η = 0.1, calculate the 
value of the search position after three updates.
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!