首页 >
> 详细

STAT 385 Homework Assignment 05

Due by 12:00 PM 11/16/2019

HW 5 Problems

Below you will find problems for you to complete as an individual. It is fine to discuss the homework problems with classmates, but cheating is prohibited and will be harshly penalized if detected.

1. Using the ggplot function and tidyverse functionality, do the following visualizations:

recreate your improved visualization in problem 2a of HW04

add a new visually appealing layer to the plot that helps clarify the plot and separately include a short description beneath the plot, such as “Fig. 1 shows…”

recreate your improved visualization in problem 4a of HW04

add a new visually appealing layer to the plot that helps clarify the plot and separately include a short description beneath the plot, such as “Fig. 2 shows…”

2. Successfully import the US Natality Data (for year 2015). The necesssary data links are in the Datasets file on Prof. Kinson’s course website (see here). One is a single csv file 1.9 GB in size. If your computer cannot handle that processing, do use the partitioned version of the data, which are 20 csv files of the same US Natality Data. Here’s a User Guide for this data that may help with understanding the data. It might also be helpful for this problem or later problems.

Bonus (worth 5 additional points, i.e. your max HW 05 score could be 15 out of 10): do problem 2 using parallel programming ideas (particularly with foreach) discussed in class. No outside functions/packages other than those discussed in the notes on parallel programming.

3. Using the ggplot function and tidyverse functionality, recreate or reimagine the following visualizations using the appropriate data. Be sure to use the visual design considerations from Knaflic’s Storytelling with Data.

The image below uses the US Natlity Data. Also, explain the image with Markdown syntax.

The image below uses the US Natlity Data. Also, explain the image with Markdown syntax (do not include the explanation within the visualization).

The image below uses the Chicago Food Inspections Data link here. Also, explain the image with Markdown syntax (do not include the explanation within the visualization).

The image below uses the Chicago Food Inspections Data. Also, explain the image with Markdown syntax (do not include the explanation within the visualization).

4. Do the following:

Redo problem 3 in HW03 using the parallel programming. Does parallel computing perform the tasks in parts c and e faster than the method that you used in HW03? Show your work including the runtimes for the un-parallelized and parallelized versions.

5. Problem in parallel coding

Install the conformal.glm R package which can be found at https://github.com/DEck13/conformal.glm.

Run the following code:

library(devtools)

install_github(repo = "DEck13/conformal.glm", subdir="conformal.glm")

library(HDInterval)

library(MASS)

library(parallel)

library(conformal.glm)

set.seed(13)

n <- 250

# generate predictors

x <- runif(n)

# set regression coefficient vector

beta <- c(3, 5)

# generate responses from a linear regression model

y <- rnorm(n, mean = cbind(1, x) %*% beta, sd = 3)

# store predictors and responses as a dataframe

dat <- data.frame(y = y, x = x)

# fit linear regression model

model <- lm(y ~ x, data = dat)

# obtain OLS estimator of beta

betahat <- model$coefficients

# convert predictors into a matrix

Xk <- as.matrix(x, nrow = n)

# extract internal model information, this is necessary for the assignment

call <- model$call

formula <- call$formula

family <- "gaussian"

link <- "identity"

newdata.formula <- as.matrix(model.frame(formula, as.data.frame(dat))[, -1])

# This function takes on a new (x,y) data point and reports a

# value corresponding to how similar this new data point is

# with the data that we generated, higher numbers are better.

# The goal is to use this function to get a range of new y

# values that agrees with our generated data at each x value in

# our generated data set.

density_score <- function(ynew, xnew){

rank(phatxy(ynew = ynew, xnew = xnew, Yk = y, Xk = Xk, xnew.modmat = xnew,

data = dat, formula = formula, family = family, link = link))[n+1]

}

# We try this out on the first x value in our generated data set.

# In order to do this we write two line searches

xnew <- x[1]

# start line searches at the predicted response value

# corresponding to xnew

ystart <- ylwr <- yupr <- as.numeric(c(1,xnew) %*% betahat)

score <- density_score(ynew = ystart, xnew = xnew)

# line search 1: line search that estimates the largest y

# value corresponding to the first x value that agrees with

# our generated data

while(score > 13){

yupr <- yupr + 0.01

score <- density_score(ynew = yupr, xnew = xnew)

}

# line search 2: line search that estimates the smallest y

# value corresponding to the first x value that agrees with

# our generated data

score <- density_score(ynew = ystart, xnew = xnew)

while(score > 13){

ylwr <- ylwr - 0.01

score <- density_score(ynew = ylwr, xnew = xnew)

}

Write a function which runs the two line searches in part a for the jth generated predictor value.

Use parallel programming to run the function you wrote in part b. Save the output and record the time that it took to perform these calculations NOTE: It is not advised to use detectCores as an argument in defining the number of workers you want. It’s much better to specify the number of workers explicitly.

Redo the calculation in part c using lapply and record the time it took to run this job. Which method is faster?

Using ggplot, plot the original data and depict lines of the lower and upper boundaries that you computed from part c.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 代写cs2106 Usfat File System代做留学生r语言、R实验 2020-01-24
- 代写cs3086 Gnuplot帮写r语言程序、R作业调试、代做r程序 2020-01-24
- 代写ccs1ppa World Of Zuul调试r作业、R语言程序帮写、调 2020-01-24
- 代写cdt336 Single Neuron Model代做pyth... 2020-01-24
- 代写cse201 Mancala代写java编程、Java程序代做 2020-01-23
- 代写csci141 Gregorian Calendar调试pyth... 2020-01-23
- 代写cs428 Breaking The Unbreakable Cod... 2020-01-23
- 代写cs3014 Google Analytics Customer Rev 2020-01-21
- 代写cmpsc121 Structs代写留学生c/C++实验... 2020-01-21
- 代写mis6326 Data Management调试存储过程作业、数据库编 2020-01-21
- 代写msci 581作业、代做marketing Analytics作业、P 2020-01-20
- Software课程作业代做、代写java，C/C++程序设计作业、Pyth 2020-01-20
- Tcss 372作业代做、代写python，Java编程语言作业、代做c/C 2020-01-20
- Emergency Facilities作业代写、代写r编程设计作业、R课程 2020-01-18
- Cis 413/513作业代做、代写data Structures作业、Ja 2020-01-18
- 代写ia626留学生作业、Python程序设计作业调试、代做data课程作业 2020-01-18
- Mat00027i作业代写、Java程序语言作业调试、Mathematica 2020-01-17
- 代做kt Model作业、代写java，Python编程设计作业、代做c/C 2020-01-17
- Data Set课程作业代做、代写r程序语言作业、Ltcret留学生作业代做 2020-01-17
- 代写rstudio留学生作业、代做r编程设计作业、代写r课程设计作业代做数据 2020-01-17