#
讲解 MATH1033、辅导 c/c++，Java程序语言

The University of Nottingham

SCHOOL OF MATHEMATICAL SCIENCES

SPRING SEMESTER 2023-2024

MATH1033 - STATISTICS

Your neat, clearly-legible solutions should be submitted electronically via the MATH1033 Moodle page by

18:00 on Wednesday 8th May 2024. Since this work is assessed, your submission must be entirely your

own work (see the University’s policy on Academic Misconduct). Submissions made more than one week

after the deadline date will receive a mark of zero. Please try to make your submission by the deadline.

General points about the coursework

1. Please use R Markdown to produce your report.

2. An R Markdown template file to get you started is available to download from Moodle. Do make use of

this, besides reading carefully the Hints and Tips section below.

3. Please submit your report a self-contained html file (i.e. as produced by R Markdown) or pdf.

4. If you have any queries about the coursework, please ask me by email (of course, please limit this to

requests for clarification; don’t ask for any of the solution nor post any of your own).

Your task

The data file scottishData.csv contains a sample of the ”Indicator” data that were used to compute the 2020

Scottish Index of Multiple Deprivation (SIMD), a tool used by government bodies to support policy-making. If

you are interested, you can see the SIMD and find out more about it here: https://simd.scot

Once you have downloaded the csv file, and once you’ve set the RStudio working directory to wherever you

put the file, you can load the data with dat <- read.csv(”scottishData.csv”) The file contains data for a sample

of 400 ”data zones” within Scotland. Data zones are small geographical areas in Scotland, of which there

are 6,976 in total, with each typically containing a population of between 500 and 1000 people. Of the 400

observations within the data file, 100 are from the Glasgow City, 100 are from City of Edinburgh, and 200

are from elsewhere in Scotland. Glasgow and Edinburgh are the two largest cities in Scotland by population.

Table 1 shows a description of the different variables within the data set.

Your report should have the following section headings: Summary, Introduction, Methods, Results, Conclusions.

For detailed guidance, read carefully section page 4 of the notes, and the ”How will the report be marked?”

section below.

The Results section of your report should include subsections per points 1-3 as follows. The bullet points

indicate what should be included within these subsections, along with suitable brief commentary.

MATH1033 Turn Over

2 MATH1010

1. A comparison of employment rate between Glasgow and Edinburgh.

• A single plot with side-by-side boxplots for the Employment_rate variable for each of

Glasgow and Edinburgh.

• A histogram of the Employment_rate variable with accompanying normal QQ plot, for

each of Glasgow and Edinburgh.

• Sample means and variances of the Employment_rate variable for the data zones in

each of Glasgow and Edinburgh.

• Test of whether there is a difference in variability of Employment_rate scores between

Glasgow and Edinburgh.

• Test of whether there is a difference in means of Employment_rate scores between

Glasgow and Edinburgh.

2. Investigation into how Employment_rate and other variables are associated.

• A matrix of pairwise scatterplots for the following variables: Employment_rate,

Attainment, Attendance, ALCOHOL, and Broadband. Also present pairwise correlation

coefficients between these variables.

• A regression of Employment_rate on Attendance, including a scatterplot showing a line

of best fit.

3. A further investigation into a respect of your choosing.

• It’s up to you what you choose here. Possible things you could consider are: considering

an analysis similar to 1 above, but involving the data on data zones outside of Glasgow

and Edinburgh; considering whether what you find in investigations in 2 above are

similar if you consider whether the data zones are from Glasgow, Edinburgh or elsewhere;

investigating the other variables in the data set besides these in 1 and 2.

• Note that some variables will be very strongly correlated, but with fairly obvious/boring

explanation: for example “rate” variables (see Table 1) are just “count” variables

divided by population size, and data zones are designed to have similar population

sizes.

• Think freely and creatively about what is interesting to investigate, especially how you

could make good use of the methods that you are learning in the module.

Please include as an appendix the R code to produce the results in your report, but don’t include

R code or unformatted text/numerical output in the main part of the report itself.

Hints and tips:

1. Use the template .Rmd file provided on Moodle as your starting point.

2. Read carefully “How will the report be marked?” below. Then re-read it again once again

just before you submit to make sure you have everything in place.

3. You may find the subset command useful. Some examples:

• glasgow <- subset(dat, Council_area == "Glasgow City") defines a new variable containing

data only for Glasgow.

• subset(dat, (Council_area != "City of Edinburgh" & Council_area != "Glasgow City"))

finds the data zones that are not in either Edinburgh or Glasgow.

4. The command names(dat) will tell you the names of the variables (columns) in dat.

5. dat(,c(16,17,18)) will pick out just the 16th, 17th, 18th column (for example).

MATH1010

[ ]

m

( ]

⑧m

3 MATH1010

6. The pairs() function produces a matrix of pairwise scatterplots. cor() computes pairwise

correlation coefficients.

7. Do make sure that figures have clear titles, axis labels, etc

MATH1010 Turn Over

.

4 MATH1010

How will the report be marked?

The marking criteria and approximate mark allocation are as follows:

Summary [4 marks] - have you explained (in non-technical language) (a) the aim of the analysis;

(b) (very briefly) the methods you have used; and (c) the key findings?

Introduction [5] - have you (a) explained the context, talked in a bit more detail about the aim;

(b) given some relevant background information; (c) described the available data; (d) explained

why the study is useful/important?

Methods [3] - have you described the statistical techniques you have used (in at least enough

detail that a fellow statistician can understand what you have done)?

Results [14, of which 7 are for the investigation of your choosing mentioned in point 3 above] -

have you presented suitable graphical/numerical summaries, tests and results, and interspersed

these with text giving explanation?

Conclusions [4] - have you (a) recapped your key findings, (b) discussed any limitations, and

(c) suggested possible further extensions of the work?

Presentation [10] - overall, does the report flow nicely, is the writing clear, and is the presentation

tidy (figures/tables well labelled and captioned)? Has Markdown been used well?

MATH1010

5 MATH1010

Table 1: A description of the different variables. “Standardised ratio” is such that a value of 100

is the Scotland average for a population with the same age and sex profile.

MATH1010 End