Model Statistic
|
Full Model
|
Reduced Model
|
|
β
|
p-value
|
β
|
p-value
|
Constant
|
|
|
|
|
Elevation
|
|
|
|
|
Latitude
|
|
|
|
|
Longitude
|
|
|
|
|
Log(Elevation + 1)
|
|
|
|
|
|
|
|
|
|
Model R2
|
|
|
Adjusted R2
|
|
|
Model F statistic
|
|
|
Model p-value
|
|
|
Q5. Why did you pick the final model you did?
3. Multi-Model Inference
3.1. Deer in Forests
In hardwood-conifer forests of North America, white-tailed deer have been found to drive changes in understory structure and species composition, cause species composition change of overstory trees and reduce stand timber value by slowing the recruitment of saplings to canopy positions. During winter in these forests, white-tailed deer generally shelter in mature conifer swamps, venturing out to browse in nearby stands, including northern hardwood stands. This behaviour is a response to the trade-off between conserving heat and energy in the shelter beneath the closed canopies of the (evergreen) conifer stands versus negotiating deeper snow and colder temperatures in the more open (deciduous) mixed hardwood stands to find adequate forage
To better understand influence on deer density in mixed hardwood-conifer forests, Millington et al. (2010) collected data on forest structure and estimates of deer density in 51 forest stand locations across the Upper Peninsula of Michigan. In this part of the practical we will use some of these data to explore models representing hypotheses about the influences of forest structure on deer density.
2.2 The data set
The data for this part of the practical are available in deer51.csv. The variables in the data set are:
· Site – id number for the site the data in this row are for
· DD – deer density (deer km-2) at the centre of the site
· DistLC – distance to nearest lowland conifer stand from the centre of the site (km)
· DBH – mean diameter-at-breast-height of trees at this site (cm), an indicator of tree size
· BA – mean basal area of trees at this site (m2 ha-1), an indicator of tree density
· SnowD – mean estimated snow depth for winter at this site (cm)
· LCProp – the proportion of lowland conifer tree at this site
Load this data set in R and familiarise yourself with the data:
- Plot histograms or box-plots of the individual variables to check data distribution and normality
- Check correlations between variables using a matrix scatterplot
In this multi-model inference practical we will use a linear regression.
Fit a simple linear regression to predict DD from DistLC and examine diagnostic plots (e.g. histograms, QQ plots and scatter plots of residuals) to establish is residuals for this model are normally distributed (see instructions for Practical 4 if you need a reminder how to do this).
Q6. Is this a useful model? i.e. Does it explain more variation than by chance? Does it meet the assumptions of simple linear regression?
From your results for this initial simple linear regression, you should be able to see that the residuals are not normally distributed (e.g. see the Normal Q-Q plot in the diagnostic plots), violating the assumptions of the model. To remedy this we will try transforming the deer density data.
Plot a histogram of the DD variable. The distribution of these data indicates that the data are log-normally distributed. Therefore, we will examine if a log transform. of these data will enable them to meet the assumptions of ordinary least squares regression. First, let’s create a new variable and append it to the original data set. For example:
logDD <- log(deer.data$DD, 10)
deer.data <- cbind(deer.data, logDD)
Check that new variable has been added to the existing DataFrame. and how the values in the new variable relate to the DD variable. Also, check what the distribution of the new variable looks like by plotting a histogram for it.
Now fit a simple linear regression to predict logDD from DistLC and examine diagnostic plots to establish of residuals for this model are normally distributed.
Q7. Is this a useful model? i.e. Does it explain more variation than by chance? Does it meet the assumptions of simple linear regression?
Hopefully you can see that the residuals of the regression for logDD suggest this transformed dependent variable is more likely to meet the assumptions of linear regression. We will use logDD for the remainder of the analyses.
3.2. Fitting simple linear regression models
In the next section we will be completing the MMI_Summary.xlsx Excel file with results of testing the utility of individual simple linear regression models. These simple linear regression models represent single-process hypotheses; the most basic models/hypotheses we can make. In the worksheet each row will provide details of the model, with fitted parameters in the columns B to G (some of these cells will remain empty where variables are not present in a model) and other information in the remaining column (review the comments in column headers of those other columns).
In R, fit simple linear regressions to predict logDD for each of the other variables (except DD of course).
As you fit the models complete the Excel worksheet with information from R in Columns A to M. Put the parameter estimates for the Intercept and the predictor variables in the appropriate columns.
- Values for columns A to J come from the output when fitting a linear regression model.
- The value for n is the total number of data points used to fit the regression. You can check this value in the original data or use: length(model$fitted.values)where ‘model’ is the name of your model in R.
- K is the number of parameters in the model (you might use the count() formula in Excel for columns B:G).
To obtain an AIC value in R you can use the AIC function built into R. For example:
AIC(model)
where ‘model’ is the name of your model in R.
Save your data! The figure below shows how the template should look with the information for the first model.
2.4 Fitting multivariate linear regression models
Now we have an idea about the variance explained by the individual predictor variables (via p values and r2) we will fit multivariate linear regressions to represent multi-process hypotheses. We will only fit multivariate models for the predictor variables which in simple linear regression models had p < 0.05 [Think back to week 4: what is the null hypothesis of the test this p value refers to]. If you fit models for all combinations of these models you will fit four multivariate models.
Fit these models now and add the information for them to the model comparison worksheet. Save your data!
2.5 Measures of model performance
Now we have fit what we think are the most appropriate models we will compare them using measures of model performance. The measures will be r2, Adjusted r2, AIC and AICc (see Eq. 1 and lecture slides for reasoning). You should already have for all models for the first three of these in columns I, J and M (if not, complete the information for all the models you have fit so far). In column N you now need to add a formula to calculate AICc using Eq. 1 as a guide:
Eq. (1)
Where:
n is the number of data points used to fit the regression model
K is the number of parameters in the regression model
Next, calculate deltaAICc for the models using Eq. 2 and enter values in column O [Hint: you may need to sort your data before calculating these values]:
Eq. (2)
You are now ready to compare your models (which represent hypotheses) and answer the following questions:
Q8. Which model is ‘best’ according to AICc?
Q9. According to AICc (and the rules of thumb offered by Millington and Perry 2011, p.452) which models in the set you have fit have strong support?
Q10. Which model is ‘best’ according to r2 (i.e. most variance explained)? Is this the same models as for the ‘best’ according to Adj r2 and AICc?
Q11. What is your ecological interpretation of the model comparison you have done?
If you want a further challenge calculate the AIC weights for the models using Eq 4. in Millington and Perry (2011).
References
Millington, J.D.A. and Perry, G.L.W. (2011) Multi-model inference in biogeography Geography Compass 5(7) 448–530
Millington, J.D.A., Walters, M.B., Matonis, M.S. and Liu, J. (2010) Effects of local and regional landscape characteristics on wildlife distribution across managed forests Forest Ecology and Management 259 1102–1110