#
MATH5714作业代做、代写Linear Regression作业、R编程设计作业代写、代做R语言作业
代做R语言编程|帮做C/C++编程

School of Mathematics

MATH5714: Linear Regression, Robustness and Smoothing

Practical: 2019

There have been a number of “worked examples” using R in the module. All of these are on MINERVA, and

some of these may contain useful commands for this practical. If you have specific questions on R, please feel

free to email me (or come to my office) for assistance. Alternatively, you may want to ask me any questions

during the practical session which will take place on Monday 2nd December. If you do not have any specific

questions, you do not need to attend this session.

You should write up your practical using WORD (or LATEX), with all graphical and R output correctly incorporated.

The total length should not exceed 12 pages (but it could be shorter).

NOTES:

A. You must hand in your solutions to my pigeon-hole (NOT Minerva) by 2pm (GMT) on Thursday, 12th

December.

B. In accordance with policies of the School of Maths: For every period of 24 hours or part thereof that your

assessment is overdue, you will lose 5% of the total marks available for the assessment.

C. If you have special mitigating circumstances that lead you to ask for an extension, you should make your

request in the School of Maths Taught Student Office.

D. Within reason you may talk to your friends about this piece of work, but you should not send R code (or

output) to each other, and your report must be only your own work.

This practical is deliberately open-ended, with little guidance on how to proceed.

Q1 The Databank of the worldbank1

collects data (“indicators”) every year on each country of the world in

order to examine trends, relationships, effect of policies, development, etc. Two of the variables (area,

and population are given for 2010 in the data.frame which can be read in by the R command (watch out

for the ∼ if you copy and paste):

dd=read.table("http://www1.maths.leeds.ac.uk/˜charles

/math3714/area-populaton.txt",header=TRUE)

(i) Using appropriate transformations of the data, find a linear model which can describe the relationship

between population (response) and area (explanatory).

(ii) Using appropriate diagnostics, confirm that your model is acceptable.

Guidance: In your answer, you only need to describe your final model, and ONE other model

which you have examined, but deemed less appropriate.

(iii) Using your model obtain a 95% confidence interval for the mean (expected) population, for a country

with an area of 250,000 Km2

.

Q2 In this question we are going to consider many more variables in the database. Because there are so many

missing entries, a set of variables and countries were selected such that there were no missing values. The

file is the same location as before, but now with file name: worlddata-indicators.txt. Note

that we now have only 149 countries.

We will take the response variable to be CO2 emmissions per capita (CO2), which is column 15 of the

data frame after reading in to R.

1You may want to check the meaning of the variables in the worldbank website:

https://databank.worldbank.org/home.aspx

(i) With due consideration to:

– transformations,

– interactions,

– model selection,

– model checking,

– variable selection,

– etc.

obtain a model which is able to predict CO2 using the other variables.

(ii) Justify your choice by comparing at least two “competing” models. The comparison should take

note of at least (a) model selection criteria, (b) diagnostics, and (c) interpretability.

(iii) Interpret the parameters in your preferred model.

Guidance: Remember, there is probably no ONE correct answer. The important thing is that you justify

your approach.

Q3 In this last question we are going to fit nonparametric regresion models to inflation data. The data frame

is inflation.txt (same place as previously) and consists of 3 columns. The first column is the

country code, column 2 is to be treated as the explanatory variable (Inflation, GDP deflator (annual %))

and column 3 the response (Inflation, consumer prices (annual %)).

You may find the code used in lectures to be useful for this question.

(i) Using the data (xi

, yi) in the data frame, create a scatter plot of the data and add nonparametric

regression lines which shows the fitted value mˆ (x) for x in the range (−5, 50). Plot one graph which

shows the Nadaraya-Watson estimate for smoothing parameters h = 1, 2, 5, 10, and a separate

graph which shows the local linear estimates for the same four values of h. Comment on these

graphs.

(ii) For each of the 8 estimates computed in part (i), find the predicted value mˆ (x) when x = −4.2.

Arrange these values in a suitable 2 × 4 table.

(iii) Using leave-one-out cross-validation find the “optimal” choice of h in the range (.7, 2.7) for the

NW estimate, and (1.7, 3.0) for the LL estimate. In the same plot draw the lines corresponding to

the cross-validation functions as a function of h.

(iv) Replot the data, and draw on the fitted nonparametric regression lines corresponding to the optimal

values of h. Comment on the fits.

Predict mˆ (x) for x = −4.2 using the corresponding optimal values of h for the NW and LL

estimates respectively. Which of these predictions do you think will be better, and why?