首页 > > 详细

FIT5145 - Introduction to Data Science

 FIT5145 - Introduction to Data Science

Summer Semester B 2020
Assignment 1
This assesment aims to guide you in exploring a data set through the process of exploratory data analysis (EDA), primarily through visualisation of that data using various data science tools.
 
You will need to draw on what you have learnt and will continue to learn, in class. You are also encouraged to seek out alternative information from reputable sources. If you use or are 'inspired' by any source code from one of these sources, you must reference this.
 
Learning outcomes You will learn the following through completing this assessment:
 
Read in files and extract data from them into a data frame.
Wrangle and process data.
Use graphical and non-graphical tools to perform EDA.
Use basic tools for managing and processing big data.
Determine information
Communicate your findings in your report.
Submission details The Python code as a Jupyter notebook file (.ipyn). A PDF print of your Jupyter notebook containing the code, figures and answers to all the questions. Hint: Wrap your code using the Jupyter magics or pythonic standard.
 
Please note: Marks will be assigned based on their correctness and clarity of your answers and code. The PDF should be concise and not take up an excessive number of pages. You should not print the data frames in your PDF (comment out the code that prints those).
 
Zip file submissions attract a penalty of 10%. Submit two separate files requested above together. You will need to submit your PDF to Turnitin.
 
Task
In this course, you have learned about the definitions, skill sets, tools, applications and knowledge domains attributed to data science. However, these are extremely diverse and make data science challenging to define precisely. By completing the EDA, we hope you can get a clearer understanding of how a career in data science compares to others in the IT industry.
 
The Data
 
In late 2018, a survey was conducted for a large Australian collective of IT professionals. The survey, which received 7000 responses, aimed to gather information about IT professionals. The dataset was made public, and many insights have emerged since. We have taken the data set and heavily modified the data. Both to clean the data, a significant component of data science and to ensure original assignment submission.
 
The data set is called assignment1_dataset.csv, and contains respondents answers to survey questions. Each column contains the answers of one respondent to a specific question. Do not alter this dataset.
 
How to complete this assesment
 
The following notebook has been constructed to provide you with directions (blue), questions (yellow) and background information. Responses to both blue directions and yellow questions are assessed.
 
Underneath the blue direction boxes, there are empty cells with the comment #Your code. Place your code in these. You should not need to but may insert new cells under this cell if required.
 
To respond to questions you should double click on the cell beneath each question with the comment Answer. Write your answer under these.
 
Please note, your commenting and adherence to Python code standards will be marked. This notebook has been designed to give you a template for the layout of future notebooks you might create. If you require further information on Python standards, please visit https://www.python.org/dev/peps/pep-0008/
 
Do not change any of the directions or answer boxes, the order of questions, order of code entry cells or the name of the input files.
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codehelp

联系我们 - QQ: 99515681 微信:codinghelp
© 2014 www.7daixie.com
程序代写网!