Assignment 1 - Foundations of Machine Learning CSCI3151 - Dalhousie University
Q1 (30%)
Gradient descent - Linear regression
In this question we are going to learn first hand how gradient descent works in the context of a
concrete house pricing model. Note that we are not making use of a validation strategy here by
using a test set or cross-fold validation—which is something that would be otherwise generally
recommended. This exercise focuses on the inner workings of gradient descent applied to the
quadratic cost function that was learned in the Linear regression class. It’s important that the
code provided is well documented and that no other external package is used (pandas, numpy
are ok).
a) Using the house price prediction dataset (described here and downloadable from Brightspace
as housePriceKaggleTrain) and two features, i.e. 1stFlrSF and 2ndFlrSF, we want to fit a linear
regression model to predict SalePrice. We want to fit this model by applying a hand made
gradient descent algorithm. We will use the basic cost function learned in the lecture.
The model should be as follows:
p: sale price (SalePrice)
t: first floor square feet (1stFlrSF)
s: second floor square feet (2ndFlrSF)
x xp = t 1 + s 2 + b
Your function should be able to return the updated weights x1, x2 and b after every iteration of
the gradient descent algorithm.
Your function should be defined as follows:
def LRGradDesc(data, target, init_x1, init_x2, init_b, learning_rate_w,
learning_rate_b, max_iter):
Note that there is a slight variation with the approach proposed in the lecture as here we use
two different learning rates (one learning rate for the weight coefficients and another one for the
bias).
And it should print lines as indicated below:
Iteration 0: init_x1, init_x2, init_b, initial_cost
Iteration 1: [x1 after first iteration], [x2 after first iteration], [b after first iteration], [cost after first iteration]
Iteration 2: [x1 after second iteration], [x2 after second iteration], [b after second iteration], [cost after second iteration]
…
Iteration max_iter: [x1 after max_iter iteration], [x2 after max_iter iteration], [b after max_iter iteration], [cost after max_iter iteration]
Note that you may want to print every 100 or every 1000 iterations if max_iter is a fairly large
number (but you shouldn’t have more iterations than the indicated in max_iter).
b) Compare this model with a solution computed in closed form or by using a machine learning
library to compute the linear regression model.
c) Discuss how the choice of learning_rate affects the fitting of the model.
HINT: The learning rates for the weight coefficients should be about 1000 to 100000 times
smaller than the one for the bias. In part c) you are encouraged to play with different learning
rates.
Q2 (20%)
Classification using two standard techniques
In this question you will experiment with two traditional classification models on the pima indians
data set. The models are: decision trees, and SVM. The objective of this question is to
demonstrate your ability to compare different machine learning models, and derive a conclusion
if possible.
a) For the SVM classifier, use the default parameters, and 5-fold cross validation, and report the
overall accuracy and confusion matrix, as well as accuracy and confusion matrix for each fold.
What is the standard deviation of accuracy over the folds?
b) for the Decision tree classifier, experiment with different numbers of max depths of the tree
(in the range from 2 to 8). Use the default values for the remaining parameters. Evaluate each
parameter selection using 5-fold cross validation, and report the overall accuracy and the
confusion matrix. Only report the interesting parameter settings.
c) Summarize your findings from (a) and (b). What can you conclude from these two methods?
Q3 (20%)
Polynomial regression
This question aims at applying polynomial regression on a large data set and deciding the set of
hyperparameters that best applies to this scenario.
a) Use the California Housing dataset from scikit-learn and apply a polynomial regression with
the full set of features (no interactions). Experiment with polynomials of different degrees.
Compare their performance with each other using a fixed train and validation partition (top 80%
for training and the remaining 20% for validation). Discuss the interpretation of the results.
Use visualizations of appropriate quantities to make sense of the results and support your
interpretation.
Q4 (30%)
k-Nearest Neighbors
In this question you will develop a kNN algorithm to be used for the automatic recognition of
handwritten digits. You are allowed to use libraries for data manipulation and math operations
(e.g. numpy, pandas) but no machine learning library to compute k-nearest neighbors (e.g.
scikit-learn is not ok). If in doubt, please ask.
a) Write a k-NN algorithm that receives as input labeled training data and a set of unlabeled
instances. The algorithm returns the most likely label (i.e. a digit) to each unlabeled instance.
The algorithm should implement the Minkowski distance. The value for k (number of neighbors)
and the parameter p are hyperparameters of the method that will be also passed as input.
Your function should be defined as follows:
def kNN( train_x, train_y, test_x, k, p):
The code should return a list or array of predictions for all the instances in test_x.
You will test your code using a reduced version of the MNIST database. The original MNIST
dataset is a database of 60,000 handwritten digits for training and 10,000 for testing where each
digit is represented by a 784-dimensional vector. The reduced version of this dataset can be
found on Brightspace as reducedMNIST.zip.
b) Report the classification error and confusion matrix you obtain for different combinations of p
and k. Remember that you must only use the “test_labels.csv” file to compare your predictions
to. In addition, critically discuss your results, as you vary p and k.
c) What strategy would you follow to choose the parameters (k and p) for your final model, if you
did not have the labels for the test set?
Submitting the assignment
1. Your assignment as a single .ipynb file including your answers should be submitted before
the deadline on Brightspace.
Use markdown syntax to format your answers and to clearly indicate what question.
2. You can submit multiple editions of your assignment. Only the last one will be marked. It is
recommended to upload a complete submission, even if you are still improving it, so that you
have something into the system if your computer fails for whatever reason.
3. IMPORTANT: PLEASE NAME YOUR PYTHON NOTEBOOK FILE AS:
--Assignment-N.ipynb, for example
Soto-Axel-Assignment-1.ipynb
A penalty applies if the format is not correct.
4. The markers will enter your marks and their overall feedback on Brightspace, and they will
upload your Python notebook file with comments on specific cells, as a new markdown cell
below the cell being commented on.
Marking the assignment
Criteria and weights. Each criterion is marked by a letter grade. Overall mark is the
weighted average of the grade of each criterion.
0.2 Clarity: All steps are clearly described. The origin of all code used is clearly. Markdown is
used effectively to format the answer to make it easier to read and grasp the main points. Links
have been added to all online resources used (markdown syntax is: [AnchorText](URL) ).
0.2 Justification: Parameter choices or processes are well justified.
0.2 Results: The results are complete. The results are presented in a manner that is easy to
understand. The answer is selective in the amount and diversity of the experimental results
presented.
Only key results that support the insights are presented. There is no need to present every
single experiment you carried out. Only the interesting results are presented, where the
behaviour of the ML model varies.
0.4 Insights: The insights obtained from the experimental results are clearly explained. The
insights are connected with the concepts discussed in the lectures.
The insights can also include statistical considerations (separate training-test data,
cross-validation, variance).Preliminary investigation of the statistical properties of the attributes
(e.g. histogram, mean, standard deviation) is included.