辅导129,971留学生、讲解R、辅导R程序设计、讲解data 辅导Web开发|辅导R语言编程

2019/7/16 NYU Classes : Advanced Test, Analysis, & Exp, Section 004 : Assignments
file:///Users/littlesunsu/Documents/NYU Classes _ Advanced Test, Analysis, & Exp, Section 004 _ Assignments.htm 1/2
Instructions
You are analyzing which factors are driving wine ratings. You are given a data set wine_data_clean_v2.csv
The data contains 129,971 observations with the following variables:
country- country the wine is from
points- points given to rate the wine (0%-100% scale), this is essentially the wine rating
price- price of bottle of wine in USD ($)
province-which province wine is from
variety- type of wine
First make sure you install the following packages to have these libraries:
library(readr)
library(data.table)
library(broom)
options(max.print = "100000") #is useful when regression output is really big
Try reading the file using the base function read.csv()
Notice that some of the names of levels are not readable.
Use the read_csv function from the readr package to read the file:
wine_df2<-read_csv("C:/Users/Yevgeniy/Desktop/NYU Courses/Fall 2018/Advanced Test and Experimental
Design/Assignment 2/wine_data_clean_v2.csv")
(you can also do this by setting file.path() first as we did in class to get your folder location)
Now there is a problem that the variables that are supposed to be factors are characters.
When running your regression model and adding "character variables" write as.factor(variable_name) so it treats
the levels of the variable as factors (indicators/dummies)
1) Write down the mathematical model for testing the impact of several drivers on the outcome (1 point) (Note this
is not the same as R code this should be a math formula)
2) Write down the R model for testing the impact of several drivers on the outcome (1 point)
3) Run the model in R, and assign it to a variable. Store the model estimates in a data frame by using the tidy
command from the "broom" package.
sum_lm<-as.data.table(tidy(model_name))
Notice how now all your regression estimates are stored as a data frame (data table in this case since we coerced
it). This allows us to output the coefficients relevant to us in an easier way. Since we do not really care about the
intercept we can generate a new data frame without the intercept:
no_icept<-sum_lm[!(term=="(Intercept)")]
a) Which variables are statistically significant at the 0.05 significance level and have a coefficient estimate that is
positive (>0). What does it mean that these variables have a positive and significant coefficient? (5 points)
[Hint: Your output should be a data set, you can use the data.table package or the plyr package]
b) Which variable is statistically significant at the 0.05 significance level and has the highest impact on wine rating
points based on the coefficient estimate? Interpret your result. (5 points)
[Hint: You can use data.table package or the dplyr package]
c) Plot the residual vs. fitted graph. Which assumptions can you visually inspect from this graph. Do you think the
linear regression model assumptions of zero conditional mean and homoscedasticity?are satisfied? (2 points)
d) How much variation in wine rating (points) is explained by your independent variables? (1 point)
[Hint: You can use summary(your_reg_model_object)]2019/7/16 NYU Classes : Advanced Test, Analysis, & Exp, Section 004 : Assignments
file:///Users/littlesunsu/Documents/NYU Classes _ Advanced Test, Analysis, & Exp, Section 004 _ Assignments.htm 2/2
e) After thinking some more about your experiment you realize that the wine taster's unique tastes and preferences
may be biasing the results of your model. You decide to use a fixed-effects model in order to control for
unobservable factors due to the wine tasters themselves. To use the fixed-effects approach you can add a
dummy/indicator variable for each taster, this is done automatically in R by including the taster_name variable as a
factor in your regression:
+as.factor(taster_name)
i) Can you identify which tasters had a negative and significant impact on the average wine rating at the 0.05
significance level? Identify the variable with the highest impact on average wine rating as you did in the previous
model, is it the same or different as the one you had before? (3 points)
ii) Which model has more predictive power in terms of ? Interpret your findings, are wine taster fixed-effects
contributing any predictive power to your model?(2 points)
Additional resources for assignment
File attachment wine_data_clean_v2.csv ( 6 MB; Jul 16, 2019 10:27 am )

R2
2