Tutorial 2 Introduction to Python and Jupyter
Tutorial Objectives:
· To introduce Jupyter and Google Colab & their key features
· To learn how to import & prepare data files
Jupyter (Anaconda) or Google Colab is used in this course. Please have either Jupyter or Google Colab installed/set-up on your laptop before class.
· Anaconda Juypter
Ø Getting started: Launch Anaconda Navigator.
Ø The Environment: Showing the Python packages you have installed.
Ø Then Launch Jupyter Notebook:
· Google Colab
Ø Go to https://colab.research.google.com/ and sign in with your free Google account.
Ø Click on the Colab logo on the top-left corner
Ø Then click on My drive > Colab notebooks
Ø Click on New >> File Upload to upload the .ipynb file (Tut_2.ipynb)
Ø Double click on Tut_2.ipynb to open it in Colab
Install Python Packages: most Python packages are pre-installed on Colab, but on Anaconda Jupyter
Ø pip install pandas
Ø pip install numpy
Ø pip install scikit-learn
Practice exercise
Support that you have been hired by Hamilton Island Resort (https://www.hamiltonisland.com.au/ ) to determine the salient characteristics of families that have visited the island.
Household_id: household identification number.
Island Visit: 1 if visited in the past; 2 if didn’t.
Income ($): average household income.
Travel: attitude toward travel, measured by a 7-point scale: 1 not all; 7 very important.
Vacation: importance attached to family vacation, measured by a 7-point scale: 1 not all; 7 very important.
Household_size: household size
Age: age of household head
· Import the data (Tut_2.csv) using Pandas
o data=pd.read_csv(‘Data/tut_2.csv’)
· Generate a description of the data
o data.head()
o data.dtypes
o data.shape
o print(data)
· One respondent (No. 30) is missing in the data. Please add them into the data
Househould_id
|
Island_visit
|
Income
|
Travel
|
Vacation
|
Household_size
|
Age
|
Amount
|
30
|
2
|
41300
|
3
|
3
|
2
|
42
|
1200
|
o Using concat()
df2 = pd.DataFrame([[30,2,41300,3,3,2,42,1200]]
,columns=['Household_id','Island_visit','Income','Travel','Vacation','Household_size','Age','Amount'])
data=pd.concat([data,df2],ignore_index=True)
print(data)
· Name a variable
o y=data[‘Island_visit’]
o x=data[['Income','Travel','Vacation','Household_size','Age','Amount']]
· Generate descriptive stats
o data.describe()
§ count, mean, standard deviation, min, 25% percentile, 50% percentile, 75% percentile, max
o data.mode()
o data.median()
· Generate stats by categories using groupby (e..g, “Island_visit’”)
o data.groupby(data['Island_visit’']).mean()
o data.groupby(data['Island_visit’']).median()
o data.groupby(data['Island_visit’']).std()
o data.groupby(data['Island_visit’']).min()
o data.groupby(data['Island_visit’']).max()
Interpret the results above, give recommendations to the resort who they should target and explain why? Write down your answers on notebook and submit the IPYNB file via Canvas (one copy per student).