首页 >
> 详细

Applications of Data Science and Statistical Modelling

Assignment 4

29/11/2019

The dataset SubstationRPD.RData contains real power delivered (KW) for each 10-minute period, of every

day during June and July, for 410 substations in the southwest of Wales, UK. The aim of this assignment is

to understand how the power demand changes throughout the day, identify any weekly/monthly patterns if

present, and using this information fit a GAM which allows us to predict future demands. Note that in order

to fit a GAM you’ll need to have the mgcv package installed.

1. [3 marks] Produce summaries of the dataset SubstationRP D.RData and produce histograms showing

the distributions of real power delivered for the 410 substations. Comment on the distributions of

real power delivered, and any variations between those distributions between substations. (E.g. You

could choose specific 10 minute intervals - say the 10 minute window after midnight, and plot the

distribution of the power demand across the substations, or look at average daily demands, maximum

daily demands...)

2. [3 marks] For each substation, calculate the average demand for each 10 minute period (that is you

should average over the days) and then plot these on the same plot, using a different colour for each

substation. Add a thick, black line showing the overall mean for the demand of all of the substations.

Comment on the variability in patterns between substations. Does the overall mean seem a reasonable

summary of all the data? (Hint: Since we are plotting 410 separate curves, you might want to suppress

the legend, which can be done using the ggplot option ‘theme(legend.position = "none")‘).

0

100

200

300

400

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Time

Average Daily Demand

All days

3. [3 marks] Split your plot in Question 2 into four separate plots representing; 1) All days, 2) Weekdays,

3) Saturdays and 4) Sundays. Are there any differences in patterns between days? (Hint: You might

find the ‘weekdays‘ function useful.)

Now that we understand how the demand changes throughout the day, and have identified some seasonal

patterns, the next step is to fit a GAM to our data:

4. [2 marks] First, reformat the SubstationRPD.RData dataset so that each row is the average of all

demand data for each substation. That is each row corresponds to one day, and in each column you

should have the average demand (across all substations) for the corresponding 10 minute period.

1

5. [10 marks] Add a column with the day of the month, and another one with the month of the year. Note

that you can access these using the following R code:

as.numeric(substr(Date,9,10)) # day

as.numeric(substr(Date,6,7)) # month

Next collapse the data, so that the previously calculated mean power demands are in a single column, instead

of separate rows. By this point you should have a dataset similar to the following:

# A tibble: 6 x 6

# Groups: Date, weekdays [1]

Date weekdays minute.int mean day month

1 2012-06-01 Friday 1 56.7 1 6

2 2012-06-01 Friday 2 57.0 1 6

3 2012-06-01 Friday 3 56.6 1 6

4 2012-06-01 Friday 4 55.7 1 6

5 2012-06-01 Friday 5 55.5 1 6

6 2012-06-01 Friday 6 54.9 1 6

Fit and plot a GAM which accounts for the underlying seasonal pattern in demands (you should decide which

seasonal patterns are appropriate to include - daily (use the minute.int column in the above dataset), weekly -

(use the day column in the above dataset), monthly - (use the month column in the above dataset)). Comment

on the fit of the model. What are the (effective) degrees of freedom, and what does this tell us about the

complexity of the model that has been fit?

6. [4 marks] Choose an appropriate model, with which predict the demand for the 21st to the 28th of July.

Take the daily average demand, and produce a plot showing these mean predictions against time. You

can use the following code to create a new dataset for the prediction. Note that depending on how you

named the columns of your dataset you might have to modify the column names in the following code:

new.data <- data.frame(matrix(c(rep(1:144,8),rep(21:28,144),

rep(7,1152)),nrow=1152,ncol=3,byrow=FALSE))

new.data$Date <- rep(seq(as.Date("2012-07-21"),

as.Date("2012-07-28"),"days"),144)

names(new.data) <- c("minute.int","day","month","Date")

All the exercises should be solved using R. A pdf document with your answers, (commented)

R code and its outputs/plots should be submitted via ELE by Noon (12pm), 18th December.

Note that late submissions will be penalised.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 代写cs3014 Google Analytics Customer Rev 2020-01-21
- 代写cmpsc121 Structs代写留学生c/C++实验... 2020-01-21
- 代写mis6326 Data Management调试存储过程作业、数据库编 2020-01-21
- 代写msci 581作业、代做marketing Analytics作业、P 2020-01-20
- Software课程作业代做、代写java，C/C++程序设计作业、Pyth 2020-01-20
- Tcss 372作业代做、代写python，Java编程语言作业、代做c/C 2020-01-20
- Emergency Facilities作业代写、代写r编程设计作业、R课程 2020-01-18
- Cis 413/513作业代做、代写data Structures作业、Ja 2020-01-18
- 代写ia626留学生作业、Python程序设计作业调试、代做data课程作业 2020-01-18
- Mat00027i作业代写、Java程序语言作业调试、Mathematica 2020-01-17
- 代做kt Model作业、代写java，Python编程设计作业、代做c/C 2020-01-17
- Data Set课程作业代做、代写r程序语言作业、Ltcret留学生作业代做 2020-01-17
- 代写rstudio留学生作业、代做r编程设计作业、代写r课程设计作业代做数据 2020-01-17
- 代写cs2250 Delimiter Matching代做数据结... 2020-01-16
- 代写cs12b Edit Distance帮写java实验作业... 2020-01-16
- 代写mins325 Filereader And Filewriter代... 2020-01-16
- 代写cosi131 Tunnels帮写java实验作业 2020-01-16
- 代写inm312 Balancebit Software代写留学... 2020-01-16
- 代写cs61b Maze Solver代写java课程设计 2020-01-16
- Program留学生作业代做、C/C++编程语言作业代写、代做java，Py 2020-01-14