辅导 Tutorial 2 Introduction to Python and Jupyter 讲解 Python语言

Tutorial 2 Introduction to Python and Jupyter

Tutorial Objectives:

· To introduce Jupyter and Google Colab & their key features

· To learn how to import & prepare data files

Jupyter (Anaconda) or Google Colab is used in this course. Please have either Jupyter or Google Colab installed/set-up on your laptop before class.

· Anaconda Juypter

Ø Getting started: Launch Anaconda Navigator.

Ø The Environment: Showing the Python packages you have installed.

Ø Then Launch Jupyter Notebook:

· Google Colab

Ø Go to https://colab.research.google.com/ and sign in with your free Google account.

Ø Click on the Colab logo on the top-left corner

Ø Then click on My drive > Colab notebooks

Ø Click on New >> File Upload to upload the .ipynb file (Tut_2.ipynb)

Ø Double click on Tut_2.ipynb to open it in Colab

Install Python Packages: most Python packages are pre-installed on Colab, but on Anaconda Jupyter

Ø pip install pandas

Ø pip install numpy

Ø pip install scikit-learn

Practice exercise

Support that you have been hired by Hamilton Island Resort (https://www.hamiltonisland.com.au/ ) to determine the salient characteristics of families that have visited the island.

Household_id: household identification number.

Island Visit: 1 if visited in the past; 2 if didn’t.

Income ($): average household income.

Travel: attitude toward travel, measured by a 7-point scale: 1 not all; 7 very important.

Vacation: importance attached to family vacation, measured by a 7-point scale: 1 not all; 7 very important.

Household_size: household size

Age: age of household head

· Import the data (Tut_2.csv) using Pandas

o data=pd.read_csv(‘Data/tut_2.csv’)

· Generate a description of the data

o data.head()

o data.dtypes

o data.shape

o print(data)

· One respondent (No. 30) is missing in the data. Please add them into the data

Househould_id	Island_visit	Income	Travel	Vacation	Household_size	Age	Amount
30	2	41300	3	3	2	42	1200

o Using concat()

df2 = pd.DataFrame([[30,2,41300,3,3,2,42,1200]]

,columns=['Household_id','Island_visit','Income','Travel','Vacation','Household_size','Age','Amount'])

data=pd.concat([data,df2],ignore_index=True)

print(data)

· Name a variable

o y=data[‘Island_visit’]

o x=data[['Income','Travel','Vacation','Household_size','Age','Amount']]

· Generate descriptive stats

o data.describe()

§ count, mean, standard deviation, min, 25% percentile, 50% percentile, 75% percentile, max

o data.mode()

o data.median()

· Generate stats by categories using groupby (e..g, “Island_visit’”)

o data.groupby(data['Island_visit’']).mean()

o data.groupby(data['Island_visit’']).median()

o data.groupby(data['Island_visit’']).std()

o data.groupby(data['Island_visit’']).min()

o data.groupby(data['Island_visit’']).max()

Interpret the results above, give recommendations to the resort who they should target and explain why? Write down your answers on notebook and submit the IPYNB file via Canvas (one copy per student).

联系我们