首页 >
> 详细

Applications of Data Science and Statistical Modelling

Assignment 4

29/11/2019

The dataset SubstationRPD.RData contains real power delivered (KW) for each 10-minute period, of every

day during June and July, for 410 substations in the southwest of Wales, UK. The aim of this assignment is

to understand how the power demand changes throughout the day, identify any weekly/monthly patterns if

present, and using this information fit a GAM which allows us to predict future demands. Note that in order

to fit a GAM you’ll need to have the mgcv package installed.

1. [3 marks] Produce summaries of the dataset SubstationRP D.RData and produce histograms showing

the distributions of real power delivered for the 410 substations. Comment on the distributions of

real power delivered, and any variations between those distributions between substations. (E.g. You

could choose specific 10 minute intervals - say the 10 minute window after midnight, and plot the

distribution of the power demand across the substations, or look at average daily demands, maximum

daily demands...)

2. [3 marks] For each substation, calculate the average demand for each 10 minute period (that is you

should average over the days) and then plot these on the same plot, using a different colour for each

substation. Add a thick, black line showing the overall mean for the demand of all of the substations.

Comment on the variability in patterns between substations. Does the overall mean seem a reasonable

summary of all the data? (Hint: Since we are plotting 410 separate curves, you might want to suppress

the legend, which can be done using the ggplot option ‘theme(legend.position = "none")‘).

0

100

200

300

400

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Time

Average Daily Demand

All days

3. [3 marks] Split your plot in Question 2 into four separate plots representing; 1) All days, 2) Weekdays,

3) Saturdays and 4) Sundays. Are there any differences in patterns between days? (Hint: You might

find the ‘weekdays‘ function useful.)

Now that we understand how the demand changes throughout the day, and have identified some seasonal

patterns, the next step is to fit a GAM to our data:

4. [2 marks] First, reformat the SubstationRPD.RData dataset so that each row is the average of all

demand data for each substation. That is each row corresponds to one day, and in each column you

should have the average demand (across all substations) for the corresponding 10 minute period.

1

5. [10 marks] Add a column with the day of the month, and another one with the month of the year. Note

that you can access these using the following R code:

as.numeric(substr(Date,9,10)) # day

as.numeric(substr(Date,6,7)) # month

Next collapse the data, so that the previously calculated mean power demands are in a single column, instead

of separate rows. By this point you should have a dataset similar to the following:

# A tibble: 6 x 6

# Groups: Date, weekdays [1]

Date weekdays minute.int mean day month

1 2012-06-01 Friday 1 56.7 1 6

2 2012-06-01 Friday 2 57.0 1 6

3 2012-06-01 Friday 3 56.6 1 6

4 2012-06-01 Friday 4 55.7 1 6

5 2012-06-01 Friday 5 55.5 1 6

6 2012-06-01 Friday 6 54.9 1 6

Fit and plot a GAM which accounts for the underlying seasonal pattern in demands (you should decide which

seasonal patterns are appropriate to include - daily (use the minute.int column in the above dataset), weekly -

(use the day column in the above dataset), monthly - (use the month column in the above dataset)). Comment

on the fit of the model. What are the (effective) degrees of freedom, and what does this tell us about the

complexity of the model that has been fit?

6. [4 marks] Choose an appropriate model, with which predict the demand for the 21st to the 28th of July.

Take the daily average demand, and produce a plot showing these mean predictions against time. You

can use the following code to create a new dataset for the prediction. Note that depending on how you

named the columns of your dataset you might have to modify the column names in the following code:

new.data <- data.frame(matrix(c(rep(1:144,8),rep(21:28,144),

rep(7,1152)),nrow=1152,ncol=3,byrow=FALSE))

new.data$Date <- rep(seq(as.Date("2012-07-21"),

as.Date("2012-07-28"),"days"),144)

names(new.data) <- c("minute.int","day","month","Date")

All the exercises should be solved using R. A pdf document with your answers, (commented)

R code and its outputs/plots should be submitted via ELE by Noon (12pm), 18th December.

Note that late submissions will be penalised.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20
- General Specifications Use Python In T... 2020-06-20
- Comp-206 Mini Assignment 6 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Mech 203 – End-Of-Semester Project 2020-06-20
- Ms980 Business Analytics 2020-06-20
- Cs952 Database And Web Systems Develop... 2020-06-20
- Homework 4 Using Data From The China H... 2020-06-20
- Assignment 1 Build A Shopping Cart 2020-06-20