首页 >
> 详细

Questions

Install the R package CASdatasets in your computer. This package includes a

collection of datasets, originally for the book “Computational Actuarial Science with R”

edited by Arthur Charpentier (CAS with R). The package contains a large variety of actuarial

datasets. It can be downloaded from the website:

Third party insurance is a compulsory insurance for vehicle owners in Australia. It insures

vehicle owners against injury caused to other drivers, passengers or pedestrians, as a result of

an accident. Download the dataset ausprivauto0405. This dataset is based on one- year

vehicle insurance policies taken out in 2004 or 2005. There are 67856 policies, of which 4624

(6.8%) had at least one claim. First, let us consider the variable ClaimNb, which is the

number of claims made in this period.

(a) Fit a Poisson and negative binomial distributions to the empirical distribution of this

discrete variable. Select the best fitting model based on at least two model

selection criteria.

Now consider the following explanatory variables: the variable vehValue represents the

vehicle value in $10,000s. In addition, you must create and Intercept column and two

indicator variables: the age of the vehicle, vehAge (1 — old cars or oldest cars, 0 —

otherwise) and the age of the driver drivAge (1 — old people, older work, people and

oldest people, 0 — otherwise). Also Exposure refers to the time exposed to risk for each

policyholder during this period of time. Detailed information about this dataset can be

found in the book:

• De Jong, Piet, and Heller Gillian H. 2008. Generalized linear models for insurance data.

International series on actuarial science. Cambridge. Cambridge University Press.

(b) By justifying the use of the logarithmic link function fit a Poisson GLM to this

dataset to explain the number of claims in terms of the covariates. Include an

intercept, linear term for the vehicle value, indicator for the age of the vehicle and

age of the driver. Also include the exposure term in the linear predictor. Use also

suitable starting values. Give some comments on the parameter estimates. Is it

justified the use of the Poisson GLM to explain this set of data?

(c) For the negative binomial model show that V ar(Y ) = r β (1 + β).

(d) By denoting µ = E(Y ), find a new parametrization of the pmf of the negative

binomial distribution in terms of f(y|r, µ). Show that this new parametrization can

be expressed in exponential family form. Interpret the new parameters.

(e) By choosing appropriate initial values, use the Fisher-Scoring algorithm to fit a

negative binomial GLM in the form found in part (d) to this set of data. Include

an intercept, linear term for the vehicle value, indicator for the age of the vehicle

and age of the driver. Also include the exposure term in the linear predictor. Use

the same link function as in part (b). You should provide the maximum likelihood

estimates and their corresponding standard errors for each iteration. Give also the

maximum value of the log-likelihood function. Stop iterating when the absolute

value of each component of the score vector is smaller than 1×10−10. Write out the

model fitted showing the estimated regression coefficients. Calculate the variance–

covariance matrix associated to the estimates.

(f) Using the estimates derived in (e), test the statistical significance of adding an

indicator variable for the age of the driver when an intercept, linear term for the

vehicle value, indicator for the age of the vehicle are already included in the model.

Conduct the test at the 5% significance level.

(g) Compare the fit of the Poisson GLM and negative binomial GLM in terms of two

measures of model selection.

(h) Derive the expressions of Pearson’s residuals and deviance residuals for both models.

Use Quantile-Quantile (QQ) plots for assessing accuracy in the fit of both models

in terms of these residuals. Give comments about these graphs.

Page 3 of 5

Pearson’s and deviance residuals are far from normality when the response variable is

discrete and includes a high number of zero responses and they fail to provide useful

information of the inadequacy of the model. For that reason, we consider the randomized

quantile residuals as defined in:

Dunn, P. and Smyth, G. (1996). Randomized quantile residuals. Journal of Computational

and Graphical Statistics, 5(3):236–244.

(i) Use QQ plots for assessing accuracy in the fit of both models in terms of the randomized

quantile residuals for the negative binomial GLM. Give comments about

these graphs.

The variable ClaimAmount contains the sum of the claim payment for each policyholder

(0 if no claim). In the following, we only consider claims amount larger than zero.

(j) Plot the histogram of the empirical distribution of ClaimAmount for values of the

claims size less than $50,000. Then fit the lognormal and inverse gaussian distributions

given to this set of data by the method of maximum likelihood. Superimpose

the graphs of their densities to the histogram of the empirical distribution. Give

comments about the application of the likelihood ratio test for these distributions.

(k) The Value-at-Risk (VaR) is a standard risk measure that it is used to calculate

exposure to risk. In general, the VaR is the amount of capital required to ensure

that the company does not become technically insolvent. The VaR of a random

variable X is the 100-pth percentile of the distribution of X. Calculate the VaR

at 90%, 95% and 99% security levels. Compare the models. Comments on the

limitations of VaR.

(l) The Kolmogorov-Smirnov (K-S) test is useful in testing the null hypothesis H0

that a sample x comes from a probability distribution function F(x). The K–S

test rejects the null hypothesis if the maximum absolute difference between F(x)

and the empirical cumulative distribution function Fˆ

n(x) is large. Assume that

the parameters for each model are specified by the maximum likelihood estimates.

For each continuous model quote the null and alternative hypotheses. State and

calculate the value of the test statistic using the analytical expression for the test

statistic (i.e. do not use built-in functions), calculate the p-value1 and give the

conclusion of the test.

Page 4 of 5

1For the calculation of the p-value, use Monte Carlo simulation, under the null hypothesis, one simulation

involves first simulating 4,623 observations from each model (e.g. sample size) to calculate the

K-S test statistics. Then, use 10,000 simulations to estimate the p-value. The estimated p-value is the

proportion of simulations for which the test exceeds the K-S test statistic.

Appendix:

• The probability mass function of the negative binomial distribution:

• Probability density function of the inverse gaussian distribution:

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 4Cosc001w作业代做、Python程序语言作业调试、Python课程作 2019-11-20
- 代写csc 230作业、代做ascii留学生作业、Python，Java程序 2019-11-20
- Cmpt 361作业代写、代做system留学生作业、Java，C++编程语 2019-11-20
- 代做cs5783留学生作业、代写machine Learning作业、代写c 2019-11-20
- B365留学生作业代做、代写iris Data作业、代做r程序语言作业、代写 2019-11-20
- Inft 3033作业代做、代写c/C++编程语言作业、代做c++课程设计作 2019-11-20
- Cs610-101作业代写、Programming课程作业代做、C/C++, 2019-11-20
- Econ 385作业代做、R程序设计作业代写、代做r课程设计作业、Ols留学 2019-11-20
- Engg1330作业代做、代写programming作业、代写java编程语 2019-11-20
- 代写csv File作业、Dataset留学生作业代做、代写java，C++ 2019-11-20
- Web Scraping作业代做、代写media Website作业、Web 2019-11-19
- 代写module留学生作业、代写java课程设计作业、Java程序语言作业调 2019-11-19
- Cs 344留学生作业代做、C++编程作业调试、C++课程设计作业代写、代做 2019-11-19
- 代做econ 493作业、Data留学生作业代写、代做r实验作业、R编程设计 2019-11-19
- 代写cmpt 361作业、代做system留学生作业、代做python，C+ 2019-11-19
- Isa 414作业代写、R程序语言作业调试、R课程设计作业代做、代写canv 2019-11-19
- Mat 4378作业代做、代写categorical Data作业、代做r编 2019-11-19
- Stat 429作业代做、代写mathematics课程作业、代做r编程语言 2019-11-18
- 代做431 Quiz 2作业、R编程设计作业调试、R语言作业代写、代做dat 2019-11-18
- 代写mt5761留学生作业、代做statistical Modelling作 2019-11-18