COMPSCI5100
MLAI4DS Mock Exam Paper
This examination paper is worth a total of 60 marks
1. Linear regression with models of the form. , is a common technique for learning real-valued functions from data .
(a) Squared and absolute loss are defined as follows:
Describe, with a diagram if you like, why, when optimizing the parameters with the squared loss outliers have a larger effect than with the absolute loss . [6 marks]
(b) Which of the following statements is true:
A) Parameter estimation with the squared loss is not analytically tractable .
B) The squared loss is equivalent to assuming normally distributed noise .
C) The absolute loss is a popular choice for regularization .
D) The squared loss is a popular choice for regularization . [2 marks]
(c) Discuss why the value of the squared loss on the training data cannot be used to choose the model complexity. [3 marks]
(d) For the particular model , I optimize the parameters and end
up with . What does the model predict for a test point at xnew = 3? [2 marks]
(e) The radial basis function (RBF):
is a popular choice for converting the original features, xn,d , into a new set of K features prior to training. Assume the value of S is given. Describe a procedure for determining the center parameter μd,k and K . [4 marks]
(f) With respect to the functions they can fit, describe the difference between RBF and the basic linear model WT xn with a graph. [3 marks]
2. Classification question
(a) Use a classification algorithm, describe what is meant by:
(i) Generalisation [2 marks]
(ii) Over-fitting [2 marks]
(b) A classification algorithm has been used to make predictions on a test set, resulting in the following confusion matrix:
Compute the following quantities (expressing them as fractions is fine):
(i) Accuracy [2 mark]
(ii) Sensitivity [2 mark]
(iii) Specificity [2 mark]
(c) Explain why it is not possible to compute the AUC from a confusion matrix . [4 marks]
(d) Two binary classifiers are used to make predictions for the same set of six test points. These predictions are given below, along with the true labels . Compute its area under the curve (AUC) in each case . [4 marks]
(e) Explain how the SVM can be extended via the kernel trick to perform. non-linear classification . [2 marks]
3. Unsupervised learning
(a) Provide pseudo code for K-means (assume that the number of clusters is provided) . [5 marks]
(b) Is the total Euclidean distance between data points and their cluster centers a good criterion to select number of clusters in K-means? Why? [2 marks]
(c) Gaussian mixture models can be fitted to data using the expectation maximization (EM) algorithm. The EM algorithm has two steps: E-step and M-step. Describe what parameters are being estimated in each step in Gaussian mixture models . [4 marks]
(e) Describe three key differences between K-means and Gaussian mixture models [4 marks]
(e) K-means often converges to a local optimal solution. Describe a simple process for overcoming the local optimality of K-means . [3 marks]
(g) Describe two situations (with justification) where you might choose a mixture model over K-means . [2 marks]