辅导 MLAI4DS Mock Exam Paper辅导 C/C++编程

COMPSCI5100

MLAI4DS Mock Exam Paper

This examination paper is worth a total of 60 marks

1. Linear regression with models of the form. , is a common technique for learning real-valued functions from data .

(a) Squared and absolute loss are defined as follows:

Describe, with a diagram if you like, why, when optimizing the parameters with the squared loss outliers have a larger effect than with the absolute loss . [6 marks]

(b) Which of the following statements is true:

A) Parameter estimation with the squared loss is not analytically tractable .

B) The squared loss is equivalent to assuming normally distributed noise .

C) The absolute loss is a popular choice for regularization .

D) The squared loss is a popular choice for regularization . [2 marks]

(c) Discuss why the value of the squared loss on the training data cannot be used to choose the model complexity. [3 marks]

(d) For the particular model , I optimize the parameters and end

up with . What does the model predict for a test point at xnew = 3? [2 marks]

(e) The radial basis function (RBF):

is a popular choice for converting the original features, xn,d , into a new set of K features prior to training. Assume the value of S is given. Describe a procedure for determining the center parameter μd,k and K . [4 marks]

(f) With respect to the functions they can fit, describe the difference between RBF and the basic linear model WT xn with a graph. [3 marks]

2. Classification question

(a) Use a classification algorithm, describe what is meant by:

(i) Generalisation [2 marks]

(ii) Over-fitting [2 marks]

(b) A classification algorithm has been used to make predictions on a test set, resulting in the following confusion matrix:

Compute the following quantities (expressing them as fractions is fine):

(i) Accuracy [2 mark]

(ii) Sensitivity [2 mark]

(iii) Specificity [2 mark]

(c) Explain why it is not possible to compute the AUC from a confusion matrix . [4 marks]

(d) Two binary classifiers are used to make predictions for the same set of six test points. These predictions are given below, along with the true labels . Compute its area under the curve (AUC) in each case . [4 marks]

(e) Explain how the SVM can be extended via the kernel trick to perform. non-linear classification . [2 marks]

3. Unsupervised learning

(a) Provide pseudo code for K-means (assume that the number of clusters is provided) . [5 marks]

(b) Is the total Euclidean distance between data points and their cluster centers a good criterion to select number of clusters in K-means? Why? [2 marks]

(c) Gaussian mixture models can be fitted to data using the expectation maximization (EM) algorithm. The EM algorithm has two steps: E-step and M-step. Describe what parameters are being estimated in each step in Gaussian mixture models . [4 marks]