辅导GGR376、讲解data留学生、讲解R编程设计、R语言辅导辅导留学生C/C++编程|辅导留学生 Statistics统计、回归、迭代

Assignment 3: Air Pollution Interpolation and Clustering
41 Marks
Interpolation: A method of constructing new data points within the range of a discrete set of
known data points.
Clustering: Grouping a set of objects in such a way that objects in the same group (called a
cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
Groups:
You will work in groups of 4, which you have the option to select on Quercus.
Background:
Air quality is a global human health issue and recent estimates from the global burden of disease
study indicate that 7.6% of global deaths can be attributed to particulate matter ambient air
pollution (Cohen et al. 2017). The greatest occurrences of these deaths is in east and south Asian
countries of low and middle incomes; however, in the United States ambient particulate matter
air pollution is estimated as the sixth highest risk factor for death, causing 18.5 deaths per
100,000 people (Cohen et al. 2017). In addition to particulate air pollution, mortality is
associated with gaseous air pollutants that include ground-level ozone and nitrogen dioxide
(Jerrett et al. 2009; Hoek et al. 2013). Health Canada estimates 14,400 deaths annually can be
attributed to anthropogenic air pollution, which includes both acute and chronic mortality
(Health Canada 2017). Mortality is not the only negative human health outcome, chronic and
acute ambient air pollution exposure is associated with negative effects on the respiratory,
cardiovascular, nervous, urinary and digestive systems (Kampa and Castanas 2008). In addition
to human health issues, plant life is very well understood to be negatively affected by air
pollution exposure. The relationship is so strong that plants are reliable biomonitors for sulphur
dioxide, ground-level ozone, and nitrogen dioxide (Cen 2015). While less understood, past
research identifies toxic effects to wildlife from air pollution (Newman and Schreiber 1988).
Research Problem:
You will select five contiguous states in the United States to conduct spatial interpolation and
spatial clustering of annual average nitrogen dioxide (NO2), ozone (O3) and particulate matter
2.5 (PM2.5) air pollution concentrations.
When the groups are determined, each group will be provided a year and one state. This state
must be included in your set of five states and data must be from the provided year.
Format:
You are given a template for the final assignment, which is in the format of an academic journal
article. You are to complete this template for submission.
Major Tasks
Interpolation
You will create an interpolated air pollution surfaces for each pollutant in your study area. You
will use both IDW interpolation and Kriging. Remember, when conducting interpolation the key
steps include:
1. Method Selection
2. Initial parameter selection (e.g. k)
3. Fit Variogram (if kriging)
4. Cross-Validate – Use LOOCV
5. Iterate parameters / variogram models
6. Prediction
Spatial Clustering
You will apply spatially constrained clustering to identify regions that contain monitors that are
most similar in air quality using all three pollutants during the clustering.
Tools you can use for spatially-constrained clustering include:
• R
o spdep::skater
o spdep::ClustGeo
▪ Semi-constrained
o spatialcluster::scl_redcap
▪ Installed from Github: https://github.com/mpadge/spatialcluster
• GeoDa (https://geodacenter.github.io/)
o Open-source GUI
Data:
The EPA provides many prepared datasets. For this assignment the easiest to work with is the
Annual Summary Data – Concentrations by Monitor:
https://aqs.epa.gov/aqsweb/airdata/download_files.html#Annual
You can view a map of the stations at this link:
https://epa.maps.arcgis.com/apps/webappviewer/index.html?id=5f239fd3e72f424f98ef3d5def54
7eb5&extent=-146.2334,13.1913,-46.3896,56.5319
You will need to use the layer selector button to add layers.