讲解Data Collection、辅导R编程设计、讲解Java/Python语言辅导留学生 Statistics统计、回归、迭代|调试Matla

Part A
A statistical analysis of the impact of drought on algal growth in experimental
mesocosm channels.
Introduction and Data Collection:
Algal biofilms play an important trophic role in riverine systems (Ledger and Hildrew, 1998).
The composition and biomass of biofilms is dependent on light availability, nutrient supply
and grazing pressure (Caramujo, et al. 2008). Therefore, Biofilms are at risk from the direct
impacts of drought and top down effects from grazers.
The DriStream experimental facility in Fobdown, Hampshire, consists of 21 channels
supplied with water from a borehole. The channels are 10m long and 50cm wide, with an
equal amount and size of substrate distributed in a pool riffle sequence that is consistent
across all channels. Channels were colonised with matching macroinvertebrate and
macrophyte populations. The macroinvertebrate assemblage is based on the nearby chalk
stream river, where the invertebrates were originally collected from. Once populations had
stabilised, channel depths were altered. In total there are three replicates of seven different
depths (0, 2, 5, 7, 10, 25 and 35 cm).
Biofilms were grown on terracotta tiles with an area of 100 cm2
. The Biofilm was
washed off using a toothbrush and distilled water before being stored for laboratory analysis.
Four measurements of both grazed and ungrazed tiles were taken from each channel.
Grazed tiles were placed on the substrate in the pool sections and ungrazed tiles were
located alongside, suspended in the water column in order to minimise the impact of grazing
benthic invertebrates. Samples were brought back to the laboratory where the Ash Free Dry
Mass (AFDM) and the mass of Chlorophyll A (CHLA) were tested. The data used in this
analysis was collected in November, 2014.
Data Preparation:
The response variables consisted of: CHLA and AFDM. Explanatory variables included:
Depth, Grazed/Ungrazed, Flow and Plant Volume. Average response values for grazed and
ungrazed tiles were used for each channel in order to reduce the impact of variability in the
explanatory variables (depth). The data was organised in a long format (Appendix 2).Rstudio
was used to conduct all statistical analyses (R Core Team, 2013). The following
commands were used to apply necessary attributes to the data in order to allow the
undertaking of particular techniques:
Algae <- read.csv(file.choose()) # Load data filePart A
str (Algae) # check structure
Algae$Depth.Num <- Algae$Depth # creates an extra Depth column which will remain as
an integer
Algae$Depth <- as.factor(Algae$Depth) # convert depth from Int. to Fac.
Algae["Log10.Flow"] <- NA # creates the new column named "log10.Flow" filled with "NA"
Algae$Log10.Flow <- log10(Algae$Flow) # creates log10 flow for GAM analyses later
Algae["Log10.plant"] <- NA # creates the new column named "log10.Plant" filled with "NA"
Algae$Log10.Plant <- log10(Algae$Plant.Volume) # adds log 10 data for use with matlines
The following code was then used to subset grazed and ungrazed data so that the
relationships within each treatment may be investigated.
Ungrazed <- subset(Algae, Graze == "Ungrazed") # Ungrazed subset
Grazed <- subset(Algae, Graze == "Grazed") # Grazed subset
Data Exploration
Boxplots
A number of box plots were created to compare AFDM and CHLA with depth in order to
determine the nature of any response. Boxplots are simple but are a highly robust way of
exploring data because the inter quartile range (IQR) helps to limit the influence of outliers.
This is particularly useful for environmental data where distributions are commonly nonlinear
(Massart, et al. 2005).
1) Comparison of grazed and Ungrazed AFDM
op <- par(mfrow = c(1, 2))
boxplot(AFDM ~ Graze, data = Algae, ylab = "Ash Free Dry Mass") # hard to interpret
due to distribution needs transforming...
boxplot(log10(AFDM+0.001) ~ Graze, data = Algae, ylab = "log10 Ash Free Dry Mass") #
inspect transformed Ash Free Dry Mass (AFDM) difference between grazed and ungrazed
tiles
par(op)Part A
Figure 1: A comparison of AFDM from grazed and ungrazed tiles. Left – before response (AFDM) was
transformed, Right – following log10+ 0.001 transformation.
Due to the distribution of the AFDM data set, a comparison was only possible following the
transformation of AFDM. There is a difference between grazed and ungrazed AFDM, which
is investigated further in this report. Log10 + 0.001 was used to transform AFDM as there
are many zero values in the data set (Zurr, et al. 2010).
2) Change in AFDM with depth
op <- par(mfrow = c(1, 2))
boxplot(log10(AFDM+0.001) ~ Depth, data = Ungrazed, ylab = "Ash Free Dry Mass", xlab =
"Depth (cm)", main = "Ungrazed") # inspect difference in AFDM with Depth for Ungrazed
boxplot(log10(AFDM+0.001) ~ Depth, data = Grazed, ylab = "Ash Free Dry Mass", xlab =
"Depth (cm)", main = "Grazed") # inspect difference in AFDM with Depth for Grazed
par(op)Part A
Figure 2: change in AFDM with depth for both grazed and ungrazed tiles.
Changes in grazed and ungrazed tiles differ considerably. Neither patterns appear to be
linear.
3) A comparison of CHLA with grazed and ungrazed tiles.
boxplot(CHLA ~ Graze, data = Algae, ylab = "Chlorophyll A") # inspect Chlorophyll A
(CHLA) difference between grazed and ungrazed
Figure 3: CHLA for grazed and ungrazed tilesPart A
Figure 3 shows that there is a greater spread for grazed CHLA than ungrazed. However,
there is not a significant difference between the means.
4) Change in CHLA with depth
op <- par(mfrow = c(1, 2))
boxplot(CHLA ~ Depth, data = Ungrazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =
"Ungrazed") # inspect difference in CHLA with Depth for Ungrazed
boxplot(CHLA ~ Depth, data = Grazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =
"Grazed") # inspect difference in CHLA with Depth for Grazed
par(op)
Figure 4: change in CHLA with depth for grazed and ungrazed tiles
Figure 4 highlights that there is a general increase in CHLA, for both grazed and ungrazed
tiles, with depth.
CoPlots
Based on the evidence from the boxplots it appeared that there was a difference between
grazed and ungrazed responses to Depth. Therefore coplots were created in order to
visualize potential interactions that may have been taking place (Zuur, et al. 2010).Part A
1) AFDM change with grazing and Depth
coplot(log10(AFDM+0.001) ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplots
for AFDM with depth and Grazing
Figure 5: Coplot of grazed and ungrazed AFDM change with depth.
There is not a significant difference between the lines in the diagrams which suggests there
is not a significant interaction taking place (Zurr, et al. 2010). Further test will be needed to
determine this with more certainty.
2) CHLA change with grazing and Depth
coplot(CHLA ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplot for CHLA with
depth and GrazingPart A
Figure 6: Coplot of grazed and ungrazed CHLA change with depth.
A difference in line shape suggest that interaction may be taking place between grazed and
ungrazed tiles or that the response of CHLA to depth differs slightly between grazed and
ungrazed tiles.
3) AFDM change with grazing and Plant Volume
It was hypothesised that light penetration may have an impact on bethic algae
photosynthesis. Therefore, the change in AFDM and CHLA (figure 8) with plant volume is
visulaised using coplots.
coplot(log10(AFDM+0.001) ~ log10(Plant.Volume)| Graze, panel = panel.smooth, data =
Algae) # coplots for AFDM with plant volume and GrazingPart A
Figure 7: Coplot of grazed and ungrazed AFDM change with plant volume.
Figure 7 indicates that there may be interaction between grazed and ungrazed tiles with
changing plant volume. However, grazed AFDM seems to be affected by some significant
outliers which may obscure the comparison in this form.
4) CHLA change with grazing and Plant Volume
coplot(CHLA ~ log10(Plant.Volume) | Graze, panel = panel.smooth, data = Algae) # coplot
for CHLA with plant volume and Grazing - very similar to Depth
Figure 8: Coplot of grazed and ungrazed CHLA change with plant volume.Part A
Figure 8 indicated interaction may be taking place. However, the slope patterns are similar to
figure 6 suggesting that depth and plant volume are collinear.
Scatter Matrices
Two scatter matrices were created at this stage (grazed and Ungrazed) which included all
variables. This was to help visualise the change in response variables with explanatory
variables, within the grazed or ungrazed treatments. The use of smoothers in the scatter
matrices could help to identify possible non-linear patterns (Logan, 2010).
1) Grazed Scatter Matrix
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) +
CHLA , data = Grazed, diag = "boxplot", main = "Grazed") # scatter matrix to view
responses in grazed tiles
Figure 9: Scatterplot Matrix for the Grazed tilesPart A
Figure 9 shows that depth, flow and plant volume are highly collinear. Folllowing
transformation, the AFDM is still not normally distributed. CHLA appears to increase with
Depth, although at this stage it is hard to determine if this relationship is linear or not. CHLA
and AFDM are inversely related.
2) Ungrazed scatter matrix
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) +
CHLA, data = Ungrazed, diag = "boxplot",main = "Ungrazed") # scatter matrix to view
responses in ungrazed tiles
Figure 10: Scatterplot Matrix for the ungrazed tiles
Figure 10 shows that ungrazed CHLA increases with Depth, although the shape of both the
linear regression and smoother are considerably different to the grazed treatment. The
distribution of CHLA data is less normal than in the grazed treatment, whereas the
distribution of AFDM is more normal than the grazed treatment. The response of AFDM to
depth appears to be nonlinear.Part A
GG Plots
The interaction between grazed and ungrazed treatments for both CHLA and log10 AFDM
were plotted using the plyr (Wickham, 2011) and ggplot2 packages (Wickham, 2009). Using
ggplot2 allows you to visualize the change in the mean of the response whilst also
considering the variance of the data.
1) Mean grazed/ungrazed CHLA against Depth
library(plyr)
AlgaeSum <- ddply(Algae, c("Depth.Num", "Graze"), summarise,
N = length(CHLA),
mean = mean(CHLA),
sd = sd(CHLA),
se = sd / sqrt(N) ) # parameters for CHLA ggplot
AlgaeSum # check the file....
require(ggplot2)
pd <- position_dodge(.3)
ggplot(AlgaeSum, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour =
"Black",
position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5)
# plot for CHLAPart A
Figure 11: Grazed and ungrazed CHLA against depth (cm)
There is clearly interaction between grazed and ungrazed CHLA. At lower depths, grazed
tiles seem to have higher CHLA mass than ungrazed tiles. At higher depths, ungrazed CHLA
is higher than grazed. A lack of measurements between 10-35cm reduces the certainty of
predictions at larger depths.
2) Mean grazed/ungrazed log10 AFDM against Depth
AlgaeSumAFDM <- ddply(Algae, c("Depth.Num", "Graze"), summarise,
N = length(log10(AFDM+0.001)),
mean = mean(log10(AFDM+0.001)),
sd = sd(log10(AFDM+0.001)),
se = sd / sqrt(N) ) # parameters for AFDM ggplot
pd <- position_dodge(.3)Part A
ggplot(AlgaeSumAFDM, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour =
"Black",
position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5)
# ggplot for log10AFDM
Figure 12: Grazed and ungrazed log10 AFDM against depth (cm)
Figure 12 indicates that there is limited interaction between grazed and ungrazed AFDM and
that grazed AFDM is consistently higher than ungrazed AFDM. Response of both grazed
and ungrazed AFDM is non-linear.
Testing Assumptions
Before conducting statistical analysis it is important to determine if the data fits the
distributional assumptions of parametric tests as these tests have a greater statistical power
than non-parametric tests (Logan, 2010). In order to conduct two sample parametric tests Part A
(two-sample t-test) and a parametric analysis of variance (ANOVA), the data must be
normally distributed and have homogeneous variance.
1) Data distribution
The Shapiro-Wilks test was used to test is the data were normally distributed. The ShapiroWilks
test is a null hypothesis test against the assumption of normality. Therefore a p-value
below the confidence limit indicates data is not normally distributed (Fells Statsplyr, 2011).
shapiro.test(Algae$CHLA) # 0.000505
data: Algae$CHLA
W = 0.8843, p-value = 0.000505
shapiro.test(Algae$AFDM) # 2.422e-13 both p values are < 0.05 suggesting data is not
normally distributed need QQ plot to check
data: Algae$AFDM
W = 0.2512, p-value = 2.422e-13
CHLA and AFDM are determined to have P-values less than 0.05 therefore we reject the null
hypothesis that the data is normally distributed. However, the use of normality tests is
disputed due to their inflexibility with differing sample sizes (R-bloggers, 2011). Therefore,
further investigation is required using QQ-plots which plot a continuous data set against a
theoretical normal distribution (Zuur, 2009).
op <- par(mfrow = c(2, 1))
qqnorm(Algae$CHLA)
qqline(Algae$CHLA) # looks alright will assume CHLA isnormall distributed
qqnorm(Algae$AFDM)
qqline(Algae$AFDM) # not good. AFDM not normally distributed - must use nonparametric
Wilcox test for AFDM
par(op)Part A
Figure 13: QQplots of (top) CHLA and (bottom) AFDM
AFDM data is clearly not normally distributed and as such will have to be analysed using
non-parametric tests. CHLA appears to be have reasonable normal distribution and
considering the robust nature of linear regression, ANOVA and t-tests to non-normal
distributions, the CHLA data is considered to be normally distributed (Zuur, et al. 2010,
Logan, 2010).
1) Homogeneity of Variance
As AFDM is not considered to be normally distributed, tests for variance were only carried
out for CHLA. Levene’s test was used to calculate homogeneity of variance. The test is
hypothesis testing, with a null hypothesis that variance is homogeneous. In order to carry out
a t-test variances between CHLA and grazed/ungrazed must be homogenous. ANOVA and
ANCOVA also require equal variance. However, the homogeneity of these statistical models
will be determined using residual validation plots (Zurr, et al.,2010)
leveneTest(Algae$CHLA~Algae$Graze) # p > 0.05 therefore variance is considered
homogeneous - can use parametric t-test CHLA
Levene's Test for Homogeneity of Variance (center = median)Part A
Df F value Pr(>F)
group 1 1.0437 0.3131
40
leveneTest(Algae$CHLA~Algae$Depth) # p > 0.05 therefore variance is considered
homogeneous - can use ANOVA
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 6 1.6695 0.1578
35
Both Levene’s tests revealed that variances were homogeneous. Therefore, CHLA data fits
the requirements for parametric testing.
Two-sample tests
1) Wilcoxon signed-rank test
A Wilcoxon test was used to investigate the difference between grazed and ungrazed AFDM
samples because the lack of normality in the distribution prevents the use of a t-test
(Dalgaard, 2002).
library(coin) # load package needed for U test (Hawthorn, et al. 2006)
wilcox_test(AFDM ~ Graze, data = Algae, distribution = "exact") # p<0.05 (0.0004401)
therefore we can conclude there is a difference between grazed and ungrazed AFDM data
Exact Wilcoxon Mann-Whitney Rank Sum Test
data: AFDM by Graze (Grazed, Ungrazed)
Z = 3.3969, p-value = 0.0004401
alternative hypothesis: true mu is not equal to 0
A p-value < 0.05 indicates that the null hypothesis is rejected and it is considered that there
is a significant difference between grazed and ungrazed AFDM.Part A
2) T-test
As CHLA data fits the requirements for a parametric test (Dalgaard, 2002), a two-sample ttest
was undertaken to compare the grazed and ungrazed CHLA data.
t.test(CHLA ~ Graze, data = Algae) # p> 0.05 (0.6893) therefore the null hypothesis is
accepted. The mean CHLA value is not significantly different between grazed and ungrazed
tiles...
data: CHLA by Graze
t = 0.4027, df = 39.768, p-value = 0.6893
The calculated P-value is greater than 0.05 and therefore we accept the null hypothesis that
the difference between grazed and ungrazed CHLA is not significant.
Analysis of Variance
1) One-way ANOVA – Grazed treatment
ANOVA tests were carried out on CHLA with each control variable on grazed and ungrazed
treatments. This analysis will help to determine the influence of different control variables on
CHLA (Doncaster and Davey, 2007). For those tests that were statistically significant,
verification plots were analysed and post-hoc Tukey tests were applied.
#CHLA ~ Depth
CHLAgrazedaov <- aov(CHLA ~ Depth, data = Grazed) # anova of grazed tile CHLA
against Depth
summary(CHLAgrazedaov) # P<0.05 (0.0393) significant relationship
TukeyHSD(CHLAgrazedaov) # post hoc test to identify the difference between
means that are greater than the standard error
op <- par(mfrow = c(2, 2))
plot(CHLAgrazedaov) # plot validation shows slightly wedge shaped residuals
par(op)Part A
leveneTest(Grazed$CHLA~Grazed$Depth) # p > 0.05 so we can consider variance
homogeneous.
#CHLA ~ PLant Volume
CHLAgrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Grazed) # anova of grazed tile
CHLA against plant volume
summary(CHLAgrazedaov2) # p > 0.05 (0.595) Not significant
#CHLA ~ Flow
CHLAgrazedaov3 <- aov(CHLA ~ Flow, data = Grazed) # anova of grazed tile CHLA
against Flow
summary(CHLAgrazedaov3) # p > 0.05 (0.607) Not significant
For Grazed tiles, the only significant ANOVA test was CHLA with depth. The results are
summarised here:
Df Sum Sq Mean Sq F value Pr(>F)
Depth 6 0.03620 0.006034 3.069 0.0393 *
Residuals 14 0.02753 0.001966
A post-hoc Tukey test was then applied to identify differences between means that were
greater than the standard error (Logan, 2010):
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = CHLA ~ Depth, data = Grazed)
$Depth
diff lwr upr p adj
2-0 0.036333333 -0.087289876 0.15995654 0.9448870
5-0 0.125333333 0.001710124 0.24895654 0.0459610
7-0 0.086000000 -0.037623209 0.20962321 0.2770254
10-0 0.098000000 -0.025623209 0.22162321 0.1670851
25-0 0.023333333 -0.100289876 0.14695654 0.9937725
35-0 0.076333333 -0.047289876 0.19995654 0.3983507
5-2 0.089000000 -0.034623209 0.21262321 0.2453632
7-2 0.049666667 -0.073956543 0.17328988 0.8074029
10-2 0.061666667 -0.061956543 0.18528988 0.6248235Part A
25-2 -0.013000000 -0.136623209 0.11062321 0.9997609
35-2 0.040000000 -0.083623209 0.16362321 0.9164273
7-5 -0.039333333 -0.162956543 0.08428988 0.9221504
10-5 -0.027333333 -0.150956543 0.09628988 0.9858612
25-5 -0.102000000 -0.225623209 0.02162321 0.1396981
35-5 -0.049000000 -0.172623209 0.07462321 0.8163400
10-7 0.012000000 -0.111623209 0.13562321 0.9998495
25-7 -0.062666667 -0.186289876 0.06095654 0.6086721
35-7 -0.009666667 -0.133289876 0.11395654 0.9999574
25-10 -0.074666667 -0.198289876 0.04895654 0.4220462
35-10 -0.021666667 -0.145289876 0.10195654 0.9958082
35-25 0.053000000 -0.070623209 0.17662321 0.7602295
This test shows that the differences in the means of different depths is not significant.
Validation plots were then analysed to determine if the model can be accepted.
Figure 14: Validation plots for ANOVA test between grazed CHLA and depth
The residuals in figure 14 appear to be slightly wedge shaped. However, this is considered
to be attributed to a select few outliers. All plots are considered acceptable. A levene’s test
was carried out to check for homogeneity of variance:
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 6 0.7662 0.6085
14 Part A
A P-value > 0.05 indicates that the null hypothesis is accepted – the variance for the test is
considered to be homogeneous.
2) One-way ANOVA – Ungrazed treatment
The same process was repeated for the ungrazed treatment.
#Ungrazed CHLA with Depth
CHLAUngrazedaov <- aov(CHLA ~ Depth, data = Ungrazed) # anova of ungrazed tile
CHLA against Depth
summary(CHLAUngrazedaov) # P>0.05 (0.18) not significant
# CHLA ~ PLant Volume
CHLAUngrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Ungrazed) # anova of ungrazed
tile CHLA against plant volume.
summary(CHLAUngrazedaov2) # p <0.05 (0.0437) significant relationship
op <- par(mfrow = c(2, 2))
plot(CHLAUngrazedaov2) # plot validation shows that plant volume requires transforming...
par(op)
CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Ungrazed) # run anova
again with transformed plant volume
summary(CHLAUngrazedaov2) # p <0.05 (0.0236) significant relationship. improved
following transformation
op <- par(mfrow = c(2, 2)) Part A
plot(CHLAUngrazedaov2) # validaion plots are considerably better, apart from one major
outlier...
par(op)
Algaeno1 <- Ungrazed[(-1),] # remove influential point
CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Algaeno1) # run anova
again without influential point plant volume
summary(CHLAUngrazedaov2) # P<0.1 (0.0607)
op <- par(mfrow = c(2, 2))
plot(CHLAUngrazedaov2) # validaion plots look good.
par(op)
# Tukey test not applied as control variable is required to be a factor.
# Ungrazed CHLA with Flow
CHLAUngrazedaov3 <- aov(CHLA ~ Flow, data = Ungrazed) # anova of ungrazed tile
CHLA against Flow
summary(CHLAUngrazedaov3) # p > 0.05 (0.382) not significant
The only statistically significant ANOVA test for the ungrazed treatment was with plant
volume. The first summary is shown below:
Df Sum Sq Mean Sq F value Pr(>F)
Plant.Volume 1 0.01079 0.01079 4.668 0.0437 *
Residuals 19 0.04390 0.00231
Summary plots were then examined, which revealed that plant volume required transforming
in order to centre the residuals, fixing the heterogeneity issue.Part A
Figure 15: Verification plots for ANOVA of ungrazed CHLA with plant volume (before data
transformation).
A log 10 transformation was used to reduce the difference in variances (Logan, 2010).
Figure 16: Verification plots for ANOVA of ungrazed CHLA with log10 plant volume (after control
variable transformation).Part A
The Log10 transformation has reduced the heterogeneity of variance and verification plots
are considered acceptable, apart from one significant outlier, which records approximately
1.0 on the cook’s distance plot and is considered highly influential (Logan, 2010). This outlier
was removed and the ANOVA was re run, recording a P-value < 0.1 (0.06):
Df Sum Sq Mean Sq F value Pr(>F)
log10(Plant.Volume) 1 0.003307 0.003307 4.004 0.0607 .
Residuals 18 0.014866 0.000826
Validation plots were examined. Removal of the outlier, significantly improves any possible
issues of heterogeneous variance.
Figure 17: Final validation plot for ANOVA of ungrazed CHLA with plant volume (influential outlier
removed).
No Post-Hoc Tukey test was carried out for this ANOVA because a factor control variable is
required.
One-way ANOVA summary
One way ANOVA analyses reveal that grazed CHLA is significantly related to depth
(P<0.05). However, ungrazed CHLA is significantly related to plant volume (P<0.1).Part A
3) Kruskal Wallace Test - Grazed AFDM
AFDM data is not normally distributed and as a result, the Kruskal-Wallace test is used to
test the difference between the medians of the population. This rank-based test is used as
an alternative to ANOVA (Logan, 2010).
#Grazed AFDM with Depth
AFDM.graze.KW <- kruskal.test(AFDM ~ Depth, data = Grazed) # Kruskal-Wallace test
(KW test) of grazed AFDM against Depth
AFDM.graze.KW # P > 0.05 (0.3614) not significant
#PLant Volume
AFDM.graze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Grazed) # KruskalWallace
test (KW test) of AFDM against Plant volume
AFDM.graze.KW2 # P > 0.05 (0.4579) not significant
#Flow
AFDM.graze.KW3 <- kruskal.test(AFDM ~ Flow, data = Grazed) # Kruskal-Wallace test
(KW test) of AFDM against Flow
AFDM.graze.KW3 # P > 0.05 (0.3954) not significant
#Ungrazed AFDM with Depth
AFDM.ungraze.KW <- kruskal.test(AFDM ~ Depth, data = Ungrazed) # Kruskal-Wallace
test (KW test) of Ungrazed AFDM against Depth
AFDM.ungraze.KW # P>0.05 (0.284) not significant
# PLant Volume
AFDM.ungraze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Ungrazed) # KruskalWallace
test (KW test) of Ungrazed AFDM against plant volumePart A
AFDM.ungraze.KW2 # P>0.05 (0.4579) not significant
# Flow
AFDM.ungraze.KW3 <- kruskal.test(AFDM ~ Flow, data = Ungrazed) # Kruskal-Wallace
test (KW test) of Ungrazed AFDM against Flow
AFDM.ungraze.KW3 # P>0.05 (0.3954) not significant
Kruskal-Wallace summary:
No significant relationships between either grazed or ungrazed AFDM with any of the
explanatory variables.
4) Two-Way ANOVA
From the previous one-way ANOVA analysis it has been established that grazed CHLA is
related to depth and ungrazed CHLA is related to plant volume. The two-way analysis will
test the following four hypotheses (Doncaster and Davey, 2007):
1) Variation in CHLA is explained by the variance in depth and independently by the
variation in grazing (grazed/ungrazed).
# two-way anova considering grazing and Depth
Algae.aov1 <- aov(CHLA ~ Graze + Depth, data = Algae)
summary(Algae.aov1) # p > 0.05 not significant
The variance of CHLA cannot be explained by hypothesis 1.
2) Variation in CHLA is explained by the variance in plant volume and independently by the
variation in grazing (grazed/ungrazed).
# two-way anova considering grazing and plant volume
Algae.aov3 <- aov(CHLA ~ Graze + Plant.Volume, data = Algae)Part A
summary(Algae.aov3) # p > 0.05 not significant
The variance of CHLA cannot be explained by hypothesis 2.
3) Variation in CHLA is explained by the inter-dependent effects of variance in plant volume
and grazing (grazed/ungrazed).
# two-way interaction considering grazing*plant volume
Algae.aov4 <- aov(CHLA ~ Graze*Plant.Volume, data = Algae)
summary(Algae.aov4) # Graze:Plant.Volume P = 0.0796
Df Sum Sq Mean Sq F value Pr(>F)
Graze 1 0.00048 0.000480 0.171 0.6815
Plant.Volume 1 0.00265 0.002647 0.943 0.3377
Graze:Plant.Volume 1 0.00911 0.009107 3.245 0.0796 .
Residuals 38 0.10666 0.002807
Interaction between graze and plant volume is significant (P<0.1). Inspect validation plots:
Figure 18: Explanatory plots for CHLA ~ graze*plant volume ANOVA.Part A
Plots indicate that the control variable (plant volume) requires transformation due to
heterogeneity in the residuals. The ANOVA was re-run with log10 transformed plant volume.
Algae.aov4 <- aov(CHLA ~ Graze*log10(Plant.Volume), data = Algae) # run with
transformed explanatory
summary(Algae.aov4) # interaction is no longer significant
Df Sum Sq Mean Sq F value Pr(>F)
Graze 1 0.00048 0.000480 0.174 0.679
log10(Plant.Volume) 1 0.00893 0.008929 3.236 0.080 .
Graze:log10(Plant.Volume) 1 0.00464 0.004638 1.681 0.203
Residuals 38 0.10485 0.002759
Interaction between Graze and Plant volume is no longer significant following the removal of
residual heterogeneity.
4) Variation in CHLA is explained by the inter-dependent effects of variance in Depth and
grazing (grazed/ungrazed).
# two-way interaction considering grazing*Depth
Algae.aov2 <- aov(CHLA ~ Graze*Depth, data = Algae)
summary(Algae.aov2) # significant relationship (P<0.1). Graze:Depth (P= 0.0584)
Df Sum Sq Mean Sq F value Pr(>F)
Graze 1 0.00048 0.000480 0.229 0.6360
Depth 6 0.03019 0.005032 2.399 0.0536 .
Graze:Depth 6 0.02950 0.004917 2.344 0.0584 .
Residuals 28 0.05872 0.002097
Interaction between grazing and depth is significant (P<0.1). Validation plots were used to
assess the reliability of the analysis:Part A
Figure 19: Explanatory plots for CHLA ~ graze*Depth ANOVA.
Validation plots indicate that there is some heterogeneity in residuals however, this is
influenced by three outliers. This ANOVA test is accepted. Further investigation with an
ANCOVA analysis is undertaken in this report.
Linear Models
1) One-way linear models
Based on the one-way ANOVA analyses we know which one-way linear models will be
significant (grazed CHLA with Depth and ungrazed CHLA with plant volume). Linear models
were created for these variables to visualise the relationship.
Ungrazed CHLA with Plant Volume:
CHLA.ungrazed.lm <- lm(CHLA ~ Log10.Plant, data = Algaeno1) # lm of grazed tile CHLA
against plant volume
summary(CHLA.ungrazed.lm ) # P<0.1 (0.0607) significant relationshipPart A
op <- par(mfrow = c(2, 2))
plot(CHLA.ungrazed.lm) # Plot validation same as anova - looks alright one major outlier
par(op)
Verification plots are shown in figure 17. Following verification, the model was plotted with
confidence intervals of 95%.
Plot(CHLA ~ Log10.Plant , data = Ungrazed, ylim=c(0,0.3), xlab = “log10 (plant volume)”,
ylab = “Chlorophyll A”,col = “blue”, main = “Ungrazed Tiles”) # plot CHLA for
ungrazed against Plant volume.
Abline(CHLA.ungrazed.lm) # fit the linear model line.
Text(5.5,0.25, “Chlorophyll A = -0.07508Plant.Volume+ 0.02797”, pos = 2)
text(5,0.23, expression(paste(P == 0.0607)), pos = 2)
text(5,0.21, expression(paste(R^2 == 0.1365)), pos = 2) # add equation P value and R2
x <- seq(min(Ungrazed$Log10.Plant), max(Ungrazed$Log10.Plant), l=1000) # for each
value of x, calculate the upper and lower 95% confidence
y<-predict(CHLA.ungrazed.lm, data.frame(Log10.Plant=x), interval=”c”)
matlines(x,y, lty=3, col=”black”) #plot the upper and lower 95% confidence limitsPart A
Figure 20: Linear model of ungrazed CHLA change with plant volume.
The residuals were then tested:
plot(CHLA.ungrazed.lm$resid ~ log10(Algaeno1$Plant.Volume)) # residuals plot looks
good . Model can be accepted.
Figure 21: Fitted vs. predicted residuals for the linear model of ungrazed CHLA against plant volume.
It was concluded that the model was reliable because variance is considered homogeneous.Part A
Grazed CHLA with Depth:
CHLA.Grazed.lm2 <- lm(CHLA ~ Depth, data = Grazed) # lm of grazed tile CHLA against
plant volume
summary(CHLA.Grazed.lm2 ) # significant P<0.05
Residuals:
Min 1Q Median 3Q Max
-0.083667 -0.019000 -0.001667 0.023000 0.064333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002667 0.025600 0.104 0.91852
Depth2 0.036333 0.036204 1.004 0.33262
Depth5 0.125333 0.036204 3.462 0.00381 **
Depth7 0.086000 0.036204 2.375 0.03236 *
Depth10 0.098000 0.036204 2.707 0.01703 *
Depth25 0.023333 0.036204 0.644 0.52968
Depth35 0.076333 0.036204 2.108 0.05350 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04434 on 14 degrees of freedom
Multiple R-squared: 0.5681, Adjusted R-squared: 0.383
F-statistic: 3.069 on 6 and 14 DF, p-value: 0.03929
This model shows a significant relationship with a P- value <0.05. Model validation plots can
be seen in figure 14.The model was then plotted to visualise the relationship. As depth is a
factor, plotting the linear model required the input of the slope value for each factor, as
shown in the code.
plot(CHLA ~ Depth.Num, data = Grazed,

讲解Data Collection、辅导R编程设计、讲解Java/Python语言 辅导留学生 Statistics统计、回归、迭代|调试Matla

讲解Data Collection、辅导R编程设计、讲解Java/Python语言辅导留学生 Statistics统计、回归、迭代|调试Matla