讲解 COMP5318/COMP4318 Machine Learning and Data Mining Week 2讲解 Python语言

COMP5318/COMP4318 Machine Learning and Data Mining

s1 2025

Week 2 Tutorial exercises

K-Nearest Neighbor. Rule-based classifiers: PRISM

Welcome to your first COMP5318/COMP4318 tutorial! Please note the following:

• For most of the weeks there will be 2 documents with tutorial exercises:

1) theoretical (as this one), involving paper-based exercises and calculations, testing your understanding of the algorithms

2) practical using Python and its machine learning and neural network libraries (available in 2 formats: ipynb (Jupyter Notebook) and pdf - see Canvas)

• Theoretical: We will do some of these exercises at the lecture (usually the first exercise). The rest should be done at your own time. Make sure that you do all theoretical exercises as they are similar in style. to the exam questions.

• Practical: This will be the main focus of the tutorial. Sometimes it may not be possible to finish all Python exercises during the tutorial. Please do this at home as this part is important for your assignments. We have prepared very detailed notes for the practical part, we hope you will find them useful.

• The solutions for both type of exercises will be provided on Thursday evening after the last tutorial.

Exercise 1. Nearest Neighbor (to do in class)

The dataset below consists of 4 examples described with 3 numeric features (a1, a2 and a3); the class has 2 values: yes and no.

What will be the prediction of 1-Nearest Neighbor (1-NN) and 3-Nearest Neighbor (3-NN) with Euclidian distance for the following new example: a1=2, a2=4, a3=2?

Assume that all attributes are measured on the same scale - no need for normalization.

	a1	a2	a3	class
1	1	3	1	yes
2	3	5	2	yes
3	3	2	2	no
4	5	2	3	no

Exercise adapted from M. Kubat, Introduction to Machine Learning, Springer, 2017

Exercise 2. Nearest neighbor with nominal features (to do at your own time)

Consider the iPhone dataset given below. There are 4 nominal attributes (age, income, student, and credit_rating) and the class is buys_iPhone with 2 values: yes and no.

	age	income	student	credit rating	buy iPhone
1	<=30	high	no	fair	no
2	<=30	high	no	excellent	no
3	[31,40]	high	no	fair	yes
4	>40	medium	no	fair	yes
5	>40	low	yes	excellent	no
6	[31,40]	low	yes	excellent	yes
7	<=30	medium	no	fair	no
8	[31,40]	medium	no	excellent	yes
9	>40	medium	no	excellent	no

	outlook	temperature	humidity	windy	play
1.	sunny	hot	high	false	no
2.	sunny	hot	high	true	no
3.	overcast	hot	high	false	yes
4.	rainy	mild	high	false	yes
5.	rainy	cool	normal	false	yes
6.	rainy	cool	normal	true	no
7.	overcast	cool	normal	true	yes
8.	sunny	cool	high	false	no
9.	sunny	mild	normal	false	yes
10.	rainy	cool	normal	false	yes
11.	sunny	mild	normal	true	yes
12.	overcast	mild	high	true	yes
13.	overcast	hot	normal	false	yes
14.	rainy	mild	high	true	no