辅导data留学生、讲解R程序设计、辅导R编程语言、analysis讲解辅导Python程序|讲解R语言编程

Assignment 4 – Module 3
1. Instructions
This assignment is worth a total of 9 points toward your final grade. It will consist of two sections. In
Section 1, you will work with trade input and output data and learn how to manipulate them. In
Section 2, you will learn cluster analysis and work with some health data.
1) Course Materials – Jim has made an R textbook available on canvas. Go to “Library Online
Course Reserves” and you will find an e-book “R: predictive analysis: master the art of
predictive modeling” made available from 2019-10-01 to 2019-12-23. Before beginning this
assignment please spend some time reading the relevant sections of the textbook, especially
Chapters 1.3 (visualization methods) and 2.3 (cluster analysis). Students taking the data
analytics module next semester may want to study the book more during their holiday break.
2) Submission of assignment – you will be given two ways to submit your assignment:
a. RMarkdown format: you can submit your assignment as an RMarkdown file (.RMD).
Make sure to describe clearly in the file the steps for you code including explanation on
why you used a certain code / function.
i. The advantage that RMarkdown has over Word is you do not need to worry
about the formatting. Just type your comments and code as you would in a
script file and the package will help you knit everything into a html document.
Outputs and graphs will also automatically generate under your code boxes
when you run them. However, you will still need to know the syntax for creating
code boxes. All codes must be bounded by the following symbols:
```{r}
```
ii. The following YouTube video teaches some basics of using RMarkdown. Take
time to watch the video and decide if you want to use RMarkdown after.
https://www.youtube.com/watch?v=DNS7i2m4sB0
b. PDF format: you can also submit your assignment in PDF format. Copy and paste
snippets of your codes to go with your explanations. Your answers should follow the
following format.
Text explanations should be in black against a white background.
Codes / scripts should be shown in black letters in a grey box like this.
This will enable us to more easily differentiate between codes and explanations. Always provide explanations for your codes.
For visual outputs (graphs, screenshots, etc.), you can try using UBC’s free Snagit
screen capture program. In the leftmost column of your Canvas account, click on “Help”
>>> “Software Distribution”. Choose the “Snagit” application, add it to your cart and
follow the download and installation instructions.
3) Assignment due date – this assignment will be due at 11.59am on December 2nd
2019.
4) If you have any questions with regards to the assignment, you can contact either Hamzeh or
Wei Siang. Their emails and office hours are as follow:
a. Hamzeh – seh793@mail.usask.ca, Mondays & Wednesdays 10.30am-12 Noon at
MCML154.
b. Wei Siang – weisiang.chan@gmail.com, Tuesdays 10.30am-12 Noon at MCML154.
5) If you face problems with your code, send an email to Hamzeh. Include in your email:
a. Your full code;
b. The error message shown in your console; and
c. Indicate the line at which the problem appeared (if possible).
6) The data for this assignment can be downloaded from Canvas:
a. On the FRE 501 home page, click on “Canvas module” under “Module 3 (Hamzeh)”
b. Scroll down to “Data” and you’ll see the files you’ll need to download for this
assignment.
c. Find the file titled “wiot_stats_sep12.zip”
d. Download the file onto your computer. As the file is very big (260.2 MB), the download
may take some time.
e. Unzip the file. Doubleclick on the zip file and click “extract all”. A new folder will be
created with the unzipped files.
f. open the file in R (DO NOT use Excel as this will hang the program). Open RStudio and
select “file”, “import dataset”, and “from Stata…”.
g. A new window will pop up. Browse the unzipped folder and select the Stata file titled
“woit_full”.
h. Cancel the data preview (or your computer will take a very long time to load the data).
Click “import”. The dataset should download into RStudio.
2. Section 1 – Working with Trade Input / Output Data
With the United States–China Relations Act of 2000, China was allowed to join WTO in 2001. Bill
Clinton the president of USA in 2000 put too much effort to convince the U.S Congress to approve the
trade agreement between the U.S and China. Clinton believed higher levels of trade with China was in
the favour of U.S economy. However, in general American authorities argue that China hinders open
trade and does not open its market to the U.S as the U.S does.
Your task is to provide some preliminary evidence about the claims made by the U.S authorities about
Food Industry in both countries. Please use package “tidyverse” to conduct your analyses.
 POINTS:
◦ Question 1-1: 3/100
◦ Question 1-2: 2/100
◦ Question 1-3: 5/100
◦ Question 1-4: 10/100
◦ Question 1-5: 30/100
1-1. Use WIOT dataset to make two subsample of WIOT. In the first subsample we are looking for the
contribution of the U.S agricultural sector (row_item=1 and 64) in the value added of China’s food
industry (col_item=3). The second subsample includes the contribution of the China’s agricultural
sector (row_item=1 and 64) in the value added of U.S food industry (col_item=3). (consult slides 18 to
23 at the GVC_RCA lecture notes)
1-2. Calculate the share of agriculture industry in the value added of food industry for each subsample
you made (consult slides 24 at the GVC_RCA lecture notes).
1-3. Make two graphs showing the changes in the share agricultural industry in the value added of food
industry from 1996 to 2010 for each subsample made (consult slides 26 to 33 at the GVC_RCA lecture
notes). Use package gridExtra to combine the graphs
1-4. In a short paragraph explain whether the U.S authorities’ claims seems to be true and WTO needs
to conduct an investigation or it is a wrong statement. In specific focus on the trends of both graphs
before and after 2001 when China joined WTO.
1-5. Find the share of Chias’ agricultural industry in the total output values of agricultural industry and
food industry of all countries from 1995 till 2011. (HINT-1. use group_by and summarise functions.
HINT 2: group by several variables). Plot your findings where the Y axis is % share of agricultural
industry of China in the total output value of agricultural industry and food industry of all countries
and X axis is the year.
3. Section 2 – Cluster Analysis of Health Data
There is a variable in the cluster_data dataset called inc_hh. This variable is a categorical
variable ranging from 1 to 8. It shows the household income level for each individual. If
inc_hh=1 it means the annual household income of the individual in the dataset is between $0
to $19,999; consequently inc_hh=7 means the annual income level of the individual is
between $120,000 to $139,999. The final income level (inc_hh=8) is related to those
Canadians whose annual household income is equal or greater than $140,000.
In the class, we found the dietary patterns of all Canadian adults in the dataset. The questions
below can be answered by the use of your lecture notes.
• Points
◦ Question 2-1 : 15/100
◦ Question 2-2: 10/100
◦ Question 2-3: 10/100
◦ Question 2-4: 15/100
1- Please use kmean cluster analysis to identify the dietary patterns of those individuals with
the lowest income level (i.e. inc_hh=1) and income level of between $120,000 to $139,999
(i.e. inc_hh==7). Report the average intakes of 9 food groups (using the food dataset we used
in the class) across these two income groups (1 and 7). (use dyplr package for data
management, fviz_nbclust and NbClust to find the optimal number of clusters, kmeans
function to conduct kmean cluster analysis. Please consult slide 58 to 66 of cluster_analysis
lecture notes)
2- In the main dataset we have two variables called bmi_total and nrf. The first variable
indicate the body mass index (BMI) of each individual and the second variable indicate the
diet quality score of each individual based on Nutrient Rich Food index. Please find the
average BMI and NRF across clusters identified for each income groups separately using
group_by and summaries functions. (please consult slide 77 of cluster_analysis lecture notes)
3- Compare the frequencies of those Canadians who have High Quality diet across two income
groups using freq function. Also “descr” package to report the prevalence of males with high
quality diet in each income groups (please consult slide 67 to 76 of cluster_analysis lecture
notes).
4. People in the lowest income groups tend to be more obese than those in the highest income
groups. Adam believes because healthier food options are more expensive, poor people tend to
eat more of unhealthy foods therefore, they are likely to be more obese. However, Bill argues
that because of the technological advancements in agricultural sector, foods are available for
most of the people in developed countries in relatively low prices. So, we cannot blame lower
prices of unhealthy foods for higher prevalence of obesity among poor people. Using your
answers to questions 1 and 3 in a short paragraph discuss whether you support Adam or Bill?