首页 > > 详细

讲解MAST 397B、辅导R语言留学生、讲解SAS语言、辅导STATA设计讲解数据库SQL|辅导R语言程序

MAST 397B: Introduction to Statistical Computing
ABSTRACT
Notes: (i) This project can be done in groups. If it is done
in a group, you have to submit the copy for the group
(not individuals). In this case the cover page must have all
the group members with their ID numbers along with a
statement of contributions of each member of the group.
(ii) You should present references to all materials (online
or otherwise) in your report. (ii) All the codes should be
put in an appendix. (iii) Answers should be clearly stated;
a not-well written report will get only partial credit.
Instructor: Yogen Chaubey
MAST 397B
FINAL PROJECT
Due Date: December 2, 2019
MAST 397B: Introduction to Statistical Computing
Final Project
Due Date: December 2, 2019 [Hard Copies only]
Problem 1. [20 Points]
Fitting distributions to a given dataset is an important problem in statistical analysis. R
contains a package called fitdistrplus that facilitates fitting various known continuous
distributions. In general fitting a distribution requires the knowledge of the form of the
distribution such as the Gaussian distribution given by the probability density function (pdf)
????(????) = 1 ????√(2????) ????????????{? 12????2 (???? ? ????)2}; ???? ∈ (?∞, ∞).
The vector ???? = (????, ????2) is known as the parameter vector and is estimated from a random
sample (????1, ????2, … , ????????). Consider the data named goundbeef, available with the package
fitdistrplus. Fit the following two distributions for this dataset (a) log-normal distribution
(b) Gamma distribution.
(i) Use the maximum likelihood (ML) method for the log-normal distribution and
method of moments (MM) for the Gamma distribution. Note that ???? is said to have
log-normal distribution if ???? = log ???? has a normal distribution and that the Gamma
pdf with shape parameter ???? and scale parameter ???? is given by
????(????) = 1 ????????Γ(????) ?????????1 exp{ ? ???????? }; ???? ≥ 0
Use a standard statistical text for explicit formulae in order to calculate these estimators
using your own defined function in R.
(ii) Use the package fitdistrplus to find the ML and MM estimators for the two
distributions.
(iii) One method of justifying a given distribution is to perform a Chi-square goodness-of?fit test. It is given by the test statistic
????2 = ?????????? ? ?????????2 ????????2 ????????=1
Here we assume that the data is grouped into k groups (???? = # ???????? ???????????????? ???????? ????????? ?????????????????????????????????) ,
???????? is the observed frequency in ????????? group and ???????? is the frequency in ????????? group under the fitted
model.
This has to be computed by the formula, ???????? = ????????????, ???????? is the probability of the observation
being in group ???? in the model. If the model fits, the test statistic ????2 has a Chi-square
distribution with df= ????=k-1-p where p= No. of estimated parameters.
Compute the ????2 statistic for the above data for a suitable value of ????; note that for the test to
be valid each group must have 5 or more observations. Find the upper 5% value of the
appropriate ????2 distribution and compare the computed value (for both the models) in
deciding if the models fit the data. [Note: The observed value of ????2 greater than 5% value of
χ2 with df= ???? indicates poor fit].
(iv) Quality of the fits may also be gauged by plotting the histogram with estimated
density super-imposed over it. Provide the histogram with the estimated density
super-imposed over it for both the methods for each of the log-normal and gamma
distributions and comment on the quality of the fit.
(v) Another qualitative method to judge the fit is the Q-Q plot of the data. Give the QQ
plots for both the methods for each of the log-normal and Gamma densities. Comment
on the quality of fit in each case. How does it compare with your conclusion in part
(iii).
Problem 2. [15 Points]
Problem 3 [10 Points]
Consider the following data from Example 7.12
(a)The objective is to determine a line ???? = ????0 + ????1???? such that the function
????(????0, ????1) = ? |???????? ? ????0 ? ????1????????| ????????=1
is minimized. Use optim( ) function of R with starting values obtained from lm( ).
(b) Plot the least square line and the line obtained in part (a) on the scatterplot and
comment on the fit of these lines to the data.
(c) Suppose another point (2.05,3.23) is added to the data. Compute the two lines again
and comment on the effect of the new point on the estimates.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!