STAT6030 GENERALISED LINEAR MODELLING The Australian National University
Final Project
2023 Summer Session
Instructions
This final project is worth 60 marks in total and 60% of your final grade for this course. The final project is compulsory and must be submitted by 5pm on Friday 10 March 2023.
Your answers should be individually submitted through Turnitin on Wattle as a sin- gle pdf/Word document (less than 50MB) including your signed declaration form (available on Wattle), which should be included as the cover page in your submission.
You are allowed to attempt multiple submissions before the due date. If there is any issue with submissions, please send an email to your lecturer and report the problem before the due date. Late submissions will not be accepted and your final project will be marked 0.
Name your submission “CourseCode Uid”, e.g., “STAT6030 u1234567”.
Try to submit your assignment at least 30 minutes before the deadline in case
something unexpected happens, for instance an internet connection problem.
Late submissions will NOT be accepted. Extensions will usually be granted on medical or compassionate grounds on production of appropriate evidence, but must receive lecturer’s approval at least 24 hours before the deadline.
You are allowed to use course material, computer software, internet or other resources, however you must complete the final project individually. Identical submissions, even submissions containing a single identical sentence, will be treated as cheating. These are easily detected by the Turnitin system.
You are encouraged to use any results, formulas or statements from the course material. It is not required to provide their proofs.
Detailed instructions are provided in the following pages.
Data
This individual final project is designed to apply the tools provided in this course to analyse one or two real-world datasets chosen by yourself. A broad range of real data is freely available on the web. Some examples of data sources are:
COVID19 data for Australia (https://www.covid19data.com.au/yes';font-family:'Arial Unicode MS';mso-fareast-font-family:MicrosoftYaHei; font-size:11.0000pt;" >
Australian government data (https://data.gov.au);
ACT government data (https://www.data.act.gov.au/browse);
New South Wales government data (https://data.nsw.gov.au);
Victoria government data (https://www.data.vic.gov.au);
Australian government statistics and datasets (e.g., https://www.abs.gov.au or
https://www.abs.gov.au/statistics/microyes';font-family:'Arial Unicode MS';mso-fareast-font-family:MicrosoftYaHei; font-size:11.0000pt;" >
Australian central bank datasets (http://www.rba.gov.au);
US Census data (https://www.census.gov/data.html);
Federal Reserve Economic Data (FRED) (https://fred.stlouisfed.org/);
Country level datasets provided by the National Bureau of Economic Research (http://www.nber.org/data/);
World Bank datasets (https://www.doingbusiness.org/en/data);
United Nations (international) demographics data (http://data.un.org/);
ANU library (https://anulib.anu.edu.au/find-access/e-resources-databases), e.g., DatAnalysis Premium, Factiva (global news database) or Connect4.
You may also consider some well-established datasets such as datasets of academics, e.g., from the website of Ken French (http://mba.tuck.dartmouth.edu/pages/faculty/ken. french/data_library.html) or from https://archive.ics.uci.edu/ml/datasets.php. These datasets are typically studied in refereed journal articles and academic books, therefore if you choose to use one or two of these datasets, please clearly highlight differences, novelty and improvement introduced in your analysis compared to the existing literature. You may also consider data from your industry experience (you are not required to submit data but only your report) or other sources. Note that the one or two datasets that you choose cannot be any of the dataset used in lectures, tutorials and assignments of this course.
Based on the chosen one or two datasets, you are requested to analyse your data using at least two types of models from this list (each bullet point counts as one type only):
Linear Mixed Effects Model;
Binary Regression / Binomial Logistic Regression;
Poisson Log-Linear Regression / Log-linear Regression with Extra-Poisson Variation; Multicategory Logistic Regression;
Gamma / Exponential Generalised Linear Model. 2
The two types of models can be fitted to the same dataset if you are selecting two different variables as response variables. You can of course consider two datasets and analyse each of them with one of the two selected models. Note that the selection of response variables needs to be meaningful and useful in real practice. If your report includes the fitting results for only one type of models, 30 marks will be deducted from the overall mark for this final project.
Report
The final project report should be a single file containing a main manuscript of maximum 10 pages and an appendix of maximum 20 pages, following the report guidelines and structure provided below. All the R code should be relegated to the appendix. The submitted report must use:
Australian English spelling
black characters or occasional coloured characters for highlighting purposes;
text in a single column;
A4 size paper with margins of at least 0.5cm on the left, right, top and bottom sides; ? 12 point Times New Roman or similar font for the main manuscript;
10 point Times New Roman or similar font for references and appendix.
Please make sure that the submitted file is legible by the assessors.
This should be seen as an official professional report, therefore do not paste any R code or output from the R console in the main manuscript. If you would like to show results in a table you are encouraged to do so but without copying and pasting any R output directly. If R code or outputs from the R console appear in the main manuscript, 10 marks will be deducted from the overall marks for this final project.
You may have many tables and figures you would like to include, however, because of the 10-page limit, please only include the most important ones in the main manuscript. Other useful tables and figures mentioned in the main manuscript may be relegated to the appendix. You may also need to adjust the table and figure sizes properly in order to respect the page limit. If you think some tables or figures are not useful, please do not include those in the report at all (not even in the appendix). Please do not just copy and paste table and figure, and summarise the results emerging from tables and figures in words.
All results should be stated in a clear and concise manner.
Report Guidelines and Structure
You are asked to submit a report which analyses one or two datasets by using at least two types of models for your data, as explained in the Data section above.
Your final project report must be structured as follows.
Main Manuscript [maximum 10 pages, 12 point font]
1. INTRODUCTION
Clearly state the objectives of your project and provide an adequate background for your data.
This section may include: variable descriptions, the source of your data, the scientific ques- tion(s) addressed through the analyses in the following sections, an anticipation of the main finding, possible contributions in real practice, etc...
2. DATA CHARACTERISTICS
This section may include: exploratory data analysis (EDA), descriptive statistics, etc... Please make sure the analysis in this section has clear connections to or a motivation for the models described in the following section.
3. MODEL FITTING AND INTERPRETATION
Based on the chosen dataset(s), analyse your data using at least two types of models from the list in the Data section and according to the instructions in the same section.
For each type of models, you can perform model fitting using all the variables available in the dataset excluding the response variable as explanatory variables, or you can use a subset of these variables, interaction terms or variable transformations. Please make sure to clearly indicate which variables, interaction terms and transformations are used and explain the reason behind these choices.
For each type of models, please select at least one fitted model to report. Please explain how you obtained this model. In addition, please explain why you selected this model to report. The reasons can be, for example, one or more from the following:
(i) the reported model passes the model diagnostics but the other models did not;
(ii) the AIC/BIC of the reported model is the smallest;
(iii) some hypothesis testing shows that the reported model can be better than the other models;
(iv) some variables included in the reported model are important in real practice interpre- tation and therefore cannot be eliminated;
(v) the interpretations of the interaction/square/cubic/polynomial terms in the reported model may have some special meaning in real practice;
(vi) etc...
Please also interpret and discuss the model fitting outcome for the fitted models that you decide to report. One example of interpretation is trying to report which variables are statis- tically significant (can this significance be explained by the background of data?). You can also consider other discussions and interpretations as long as they are practically useful based on the data background.
Additionally to the above compulsory parts, you can also report additional model fittings and interpretations as long as they are useful in real practice and respect the page limit.
4. LIMITATIONS
Please clearly discuss the possible limitations of the fitted models that you select to report, e.g., from model diagnostics, problems in real practice, etc. If you cannot figure out any, please give a reason why the fitted models that you select to report are ideal.
5. CONCLUSION
Please give a short paragraph to summarise your findings of this project.
Appendix
[maximum 20 pages, 10 point font]
6. APPENDIX
This section should include: R code (not R output).
This section may include: additional important tables and figures which are mentioned in the main manuscript, etc...
7. REFERENCES
Please make sure that every reference, if you have used any, is cited in the main manuscript and also listed in this References section of the report.
Final Report Marking Rubric
Your score for the final project report (maximum 60 marks) will be calculated by attribut-
ing a 1.
Each
score to the following five points (from 0 to 12 marks per point):
The project is comprehensive and complete. E.g.: Have instructions been fol- lowed? Has every aspect of the project been thought through and explained? Is the workflow complete with a clear logic?
The report is well-written. E.g.: Is the analysis cogent? Is the report concise and precise? Is the summary of results accurate and neat? Do grammar mistakes affect the understanding of the report? Is the report well-organised? Is the transition from one section to another of the report smooth enough to show the logic of the analysis?
The analysis is correct. E.g.: Is the content statistically correct? Are technical terms used properly? Is the analysis consistent with the principles discussed in class? Is the methodology proper for the data? Is the method relevant to this course?
The interpretation is insightful. E.g.: Does the report bring insights in the con- clusions reached? Is the analysis accurately addressed based on the chosen data? Does the report provide good output interpretations?
Overall impression. E.g.: Does the report provide useful contributions in real prac- tice? Is the interpretation of statistical analyses useful in reality? Does the analysis address some scientific questions of interest? Does the report seem to be written putting adequate effort in it?
of the five points will be marked according to the following table:
Score
0–2 marks 3–4 marks 5–6 marks 7–8 marks 9–10 marks 11–12 marks
Rating
Very bad Poor
Fair
Good Very good Excellent
Judgement
Minimal or no effort
Needing improvement
OK, but with many major problems OK, but with some main problems Only minor problems
No problems