MATH1005 Project1
Semester 2 2019

Aim
The aim of the Projects is to give you an authentic experience of producing reproducible statistical reports using real data. They are purposely open-ended to expose you to the joys and challenges of problem solving with data.
The set of Projects is cumulative, allowing you to develop and consolidate 5 vital graduate qualities: statistical thinking, computational skills (hard skills), curiosity, communication and collaboration (soft skills).

1 Project1: Exploring data of your choice
Find data of your own choice. Investigate your own research questions using numerical and graphical summaries. Present a 3 minute (max) report, and field questions from your tutor and peers.

Submission: Submit a .html, produced in .Rmd, with SIDs and details of your Lab class at the top.
This project is designed to be completed in a group.
2 Guide to Project1
The task essentially asks you to (1) source your own tidy dataset, and then (2) investigate different variables posed as Research Questions.
In this project, we give you a very full marking criteria to teach you a framework for approaching the subsequent projects.
Keywords
In this context, an Executive Summary is a “clear, interesting summary of main insights from the report”.
Length
The report should be precise and concise, following the given word-counts.
Learning in Groups
This project is designed to be completed in a team, as learning group work is a Sydney graduate quality and very prized by employers. Group work requires many skills, including flexibility, negotiation and compromise. A good group doesn’t just coordinate or cooperate, but learns to collaborate.
Your group must consist of a maximum of 4 students from the same Lab class.
oGroups will be allocated in your lab class by your tutor.
oYour tutor will record your group number on the class attendance roll, and this information will be used in Canvas.
oIf for some reason you miss your first 3 lab classes (eg join the unit late), then your tutor will allocate you a Group number for a solo group, so you will need to submit the work individually. You will forfeit the marks associated with groups in the Communication of Presentation.
While a group project is submitted once as a group, each individual will need to separately submit the associated Group Reflection Quiz, in which you reflect on how you contributed to your group. To avoid academic misconduct, you need to be honest in that quiz, and if you didn’t contribute to the project then you will need to do the subsequent projects on your own. Students who don’t fill out the quiz will get 0 for the project.
Finding Data
The web is full of incredible data. However, it takes time to find tidy data and check its integrity. This process of searching for and assessing data is part of the project.
If you want to investigate a particular research question, then it often helps to search by the area of interest (eg breast cancer) and type of file (eg csv or .xls).
There are many excellent data depositories, for example see
ohttp://data.gov.au/
ohttps://www.springboard.com/blog/free-public-data-sets-data-science-project/
Data from the ABS can be hard to use as it often has summarised data, not raw data.
If you use data from kaggle, you must do your own original coding, or you will get 0.
You shouldn’t use data that is provided in the lectures or labs or in RStudio, as finding data is part of this project. If you do so, you can’t get the marks for IDA in the Marking Criteria (nor the Research Questions, if this overlaps with what has already been covered in class).
Presentation
You do not have to discuss everything in your report, during the presentation. Focus on whatever you think is most interesting for your peers to hear. You will judged on the quality of your statistical thought, not quantity, as you only have 3 minutes (maximum).
You will also be marked on the cohesiveness of the presentation (ie how well it all fits together). You all need to be involved.
Before you start your presentation, you should introduce your group and topic:
oOur group is …
oOur topic is …
Tips
1.Donât make the analysis harder than it needs to be. Pick a dataset that fulfills all the criteria but donât make it massively difficult! Examples of tricky data include datasets with a lot of missing observations, heavily formatted excel sheets (such as ABS data). This project should be fun and allow you to apply all your R skills to real data :)
2.Make sure you know how to use R Markdown, including how to read in data and knit the html.
A template .Rmd file is already provided for you on the projects page!
There are different commands for reading in .xlsx/.csv/.txt files! Look on the Video Resources for extra information on how to read data into R.
3.You will be presenting your work in your lab class. You may upload a separate presentation if you like, but it must be uploaded at the same time as your report. You cannot walk into your lab class with a USB for your presentation. Alternatively, you can present straight from the .html report itself! Both can work very well. Make sure you practice your presentations, as it is very obvious when people have not timed them. Your tutor will not give you a warning - you will be cut off at 3min sharp.
4.Do not wait until the last minute to upload your report! There there have been occasional issues with .html files taking a long time to upload in Canvas in the past. You should contact IT/ Canvas directly if you are having any issues submitting.

3 Marking Criteria
See Canvas Assignments for full Marking Criteria.
Written Report [By Group]
Quality
Executive Summary [Max 100 words] Clear, interesting summary of main insights from the report.
IDA [Max 200 words]
Complexity of Data & Classification of Variables 4 or more variables in data. Classification of variables shown in R Output, and assessed and changed (if needed).
Option1: IDA (For sourced data) Origin of data cited and critically assessed, including reliability and limitations.
Option2: IDA (For survey data) Assesses survey design, including potential bias or issues of ethics [include survey link, or survey at end of report]. Survey involves 20+ participants.
Exploring Data [Max 800 words]
Research Question 1 Insightful question, appropriately investigated using numerical and or graphical summaries, with results explained in context.
Research Question 2 Insightful question, appropriately investigated using numerical and or graphical summaries, with results explained in context. Uses regression (model produced and assessed).
Communication of Written Report html from .Rmd, with all SIDs listed at the top with details (day/time/room) of the same Lab class. Clear use of structure and language, carefully edited with no mistakes.
Communication of Presentation Engaging, interesting, well-paced, error free content, well coordinated as team.

• QQ：99515681
• 邮箱：99515681@qq.com
• 工作时间：8:00-23:00
• 微信：codinghelp