辅导SSCI 599、讲解R语言程序、R编程设计调试
辅导R语言程序|辅导Web开发
SSCI 599 – Spatial Topic: Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2020 1
SSCI 599 Project 2 – Explanatory Spatial Data Analysis & Multiple
Linear Regression
Prepared by An-Min Wu, PhD, Lecturer of Spatial Sciences, University of Southern California
Due Date: Monday, October 26, 11:59 pm Pacific Time
Submit Project 2 as a Word document into the corresponding assignment link on Blackboard
Value 7% of the course grade
Penalty for late delivery: 2 points deduction up to 4 days late; no points will be given over 4 days
late.
The purpose of this project is for you to apply the concepts and skills learned in the classes to
the datasets that you are interested in exploring in spatial economics. As you have done some
preliminary research work in Project 1, I hope the data comes handy for you to dive into the
analysis in this project.
In this project, you will import the data of your own interest in spatial economics into R and
conduct explanatory data analysis, exploratory spatial data analysis (including kernel density
estimation, spatial weights and global spatial autocorrelation) and multiple linear regression.
Before going into spatial data analysis, read through the entire document first. Next, go through
the hands-on R practices that we did in Week 6-7 if you have not done so, so you are familiar
with the libraries, functions and their arguments required in R to complete this project.
Learning Objectives
• To identify available spatial datasets for investigating the spatial economic topic area of
your interest
• To use kernel density estimation, global Moran’s I, and Moran scatterplot
• To conduct multiple linear regression including the pre- and after-assessments of the
dataset for its fit-for-use in regression analysis
• To interpret the outputs of kernel density estimation, spatial autocorrelation and multiple
linear regression
Assignment Description
This project looks to further your topic of interest into some practical exercises in spatial and
statistical analysis in R. To complete that, follow the instructions below:
1. From your chosen spatial economic research topic and variables in Project 1, identify spatial
datasets of the variable(s) for investigation in spatial analysis and import the data into R.
Focus on the main variable that you are interested in learning to start with. Consider the
spatial extent and unit of analysis so the data size is not too large to manage (e.g. the number
of units is greater than 50 units and not more than 500 units). Sometimes your spatial
SSCI 599 – Spatial Topic: Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2020 2
location data (e.g. county boundaries) and attributes (e.g. employment rates) might need to
come from different sources and join together before using. Do the pre-processing in Excel
or ArcGIS as needed (We will cover how to do that in R soon).
For importing shapefiles, use readOGR( ) in the rgdal package. Use ??readOGR to open the
help file in RStudio. If your data is not projected, you will have to retrieve the geographic
coordinates from polygons then use the spTransform method in the rgdal library.
If your non-spatial attribute table contains latitude and longitude, you can use read.csv( ) or
read.table( ) to import the non-spatial data first, then make your data spatial by creating a
Spatial* object (see the Week 4 handout for how to promote the data spatial).
For any question about import here or the remaining of the project, I would suggest you to
search for online resources (e.g. https://rdocumentation.org) and post your question/issues
on the Discussion Forum on Blackboard.
2. Explore the imported data distribution first by conducting explanatory data analysis (EDA)
in R. For running any statistical or spatial analysis, you should always examine your data first.
Run descriptive statistics (the R function should be the one that shows at least: sample size,
minimum, mean, median, maximum, standard deviation) and make a scatterplot, a
histogram, and a boxplot for your main variable(s) – doing all EDA here for one variable is
sufficient, but more is fine (e.g. running EDA for both variables that you want to know the
association between the two). Consider transformation if the data shows non-normal
distribution and show its normality after transformation.
3. Explore the imported spatial data by conducting explanatory spatial data analysis (ESDA) of
your main interested variable(s). including kernel density, building spatial weights matrix
followed by global Moran’s I and Moran Scatterplot.
4. Execute standard linear regression to investigate the association of the variables in the topic
of interest using lm( ) function. The number of independent variable can vary but make sure
that your final model contains only the explanatory (a.k.a independent or predictor) variables
that have their partial coefficients statistical significant.
5. Write a report that include the following items:
• Introduction: A brief description of your interested spatial economic topic, the variables
you selected (including unit of analysis and spatial extent), and the sources where you
find the data (include the organization that you obtained the data and their URL if
available).
• EDA: R code, their resulting table/plots, and a short paragraph describing othe
distribution (i.e. central tendency and dispersion) of the data and if you performed
transformation or not.
• ESDA: R code, their resulting display, and 1-2 descriptive paragraphs that interpret the
results. Here your results should consist of the KDE map, neighbor list object detail,
visualization of your spatial weights objects, Moran’s I results, and Moran scatterplot.
Whether you run Moran’s I using Monte Carlo approach is your choice. Describe what
each of these analysis results tells you about your data.
SSCI 599 – Spatial Topic: Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2020 3
• Standard linear regression: R code, their results, and a paragraph that interpret the
results.
• Reflection: A short paragraph (less than 200 words) reflect about the experience you
had when working on this project. What do you find easy? What do you find
challenging? What questions do you still have after you complete the project? Any
adjustment you might consider, either on data or operation, to improve your experience?
Deliverables
Submit a project report with the components requested above in a Word document. Include a
cover page that contains at least the information about the class number (SSCI 599), semester
(Fall 2020), project number/title and your name. Save your Project 2 report document as
Project2_[YourLastName].docx and submit it via the appropriate assignment link in Blackboard.
Additional Resources I: Data Hubs
If you have a hard time to find the appropriate datasets, you may consider to use the following
sources and adopt the datasets mentioned here to use in your project.
1. City of Los Angeles GeoHub: https://geohub.lacity.org. Datasets you may consider to
use include, but not limited to, Los Angeles index of displacement pressure, traffic
collision or traffic accidents data.
2. COVID-19 GIS Hub: https://coronavirus-resources.esri.com. If you are interested in
understanding COVID-19 impact of our social and economic aspects of life, you might
find this data hub useful. Additionally, as I want you to make a story map for the final
presentation that combines the analysis and information for all of your projects this
semester, you might also check out how Esri utilizes its ArcGIS Story Map to tell the
story of its work in COVID-19 (https://esri.com/about/newsroom/blog/gis-toachieve-equitable-speedy-vaccine-distribution)
3. The U.S. Census’s American Community Survey 5-year Data:
https://census.gov/data/developers/data-sets/acs-5year.html. The Census Bureau not
only offers spatial data (TIGER/Line data), but also include various socio-economic and
demographic factors that are surveyed every year in various census administrative levels
you can download for use.
4. IPUMS: https://ipums.org. As a part of the Institute for Social Research and Data
Innovation at the University of Minnesota, IPUMS provides census and survey data
from the U.S. and around the world. IPUMS integrates the census type data to make it
easy to study and research. For your information, you may also want to check the
‘ABOUT’ tab if you look for the data analysis type of employment in the near future.
Additional Resources II: Creating neighbor object list for a point data
Assume that we want to explore a dataset that contains three columns including latitude, longitude
and the average math score of schools in one district. We can import this data (.csv), transform it
to a spatial object (sp), and assign its datum WGS84:
SSCI 599 – Spatial Topic: Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2020 4
To create an object that describes the neighbor relationship from the point data, here we use the
k nearest neighbor (knn) method:
The resulting neighbor object is a knn class of object. Now we can convert the knn object into a
more generic class of neighbors nb:
Now you can convert the nb object to the listw object using nb2listw.