# 辅导data编程课程、辅导Java，c++程序

Machine Learning for Big Data: Exercises for Lecture 2
(k-NN, Linear and Logistic Regression)
Question 1 (require somes calculation which is probably best dealt with by implementing a small
program). The following represents a 2-dimensional data set with two numerical features and two
binary classes.
1. Do you recognise any issues about the data set? If so, apply some preprocessing before proceeding
to the next part of the question.
2. Consider now the unlabelled point
x = (x1, x2) = (4.41, 25.0),
which we would like to classify. Perform a k-nearest neighbour search for k = 1, 3, 5 and Use
majority vote to determine if x should get label −1 or +1. If you like, you can also experiment
with different distance functions.
Question 2. Consider the following data set, which records several phone calls of clients to a company.
1. Fit the data to a linear regression model and compute the R2
-value.
2. Consider a client that calls the company for 90 minutes. Can you estimate the number of
purchased items?
1
3. Somebody tells you that a recent client purchased 3 items, but you do not know the length of the
call. Can you estimate that number?
Question 3. A medical study examined 300 people with high blood pressure and 200 people with low
blood pressure. During the period of the study, 30 of these people in the low-blood-pressure group and
100 in the high-blood pressure group suffered from cardiovascular disease (heart disease).
1. Sketch how the data set looks like (i.e., explanatory variables and outcomes).
2. Apply logistic regression.
3. Consider now the classification problem. How would you classify a person with high-bloodpressure?
How could you try to refine your prediction?