STAT6016 Spatial Data Analysis
Mini-project
(assessment weighting: 25%)
Due date: 14 December, 2022
Background and problem setting
The spreadsheet “Data” of the Excel file data.xlsx, extracted from Global Terrorism Database (GTD)
(https://www.start.umd.edu/data-tools/global-terrorism-database-gtd),
contains data on 34 violent incidents which occurred in Hong Kong during 2019–2020, a period of
social unrest.
Consider Hong Kong as a 2D rectangular spatial set
S = {(x, y) : 113.831 ≤ x ≤ 114.409, 22.153 ≤ y ≤ 22.546},
where x and y denote the longitude and latitude measures, respectively. We may define the “area”
of a sub-region R of Hong Kong to be ∫R dx dy. Thus, the total “area” of Hong Kong is given by
(114.409? 113.831)(22.546? 22.153) = 0.227154. The (x, y)-coordinates of the locations of the 34
incidents are given in the dataset, which form a sample S = {S1, . . . , S34} of a point process on S .
Hong Kong is divided into 18 districts. In this study we assume a rectangular shape for each
district, whose four boundaries are given in the spreadsheet “18 districts” of data.xlsx, measured in
longitude and latitude. The spreadsheet also provides the area and population density (in million per
unit area) of each district. Define a covariate V (s), for s ∈ S , by
V (s) = population density of district d if s is located in district d.
Thus, V (s) has a constant value for all s belonging to the same district.
Assume that S follows a Cox point process model driven by a random intensity Λ satisfying
ln Λ(s) = lnWα + βV (s), s ∈ S ,
where β ∈ R is an unknown coefficient and Wα denotes a gamma (α, α) random variable with an
unknown parameter α > 0 and density function
fW (w|α) = α
αwα?1e?αw
Γ(α)
, w > 0.
Note that Wα has mean 1 and variance 1/α.
1
Complete TASKS 1–4 with the help of a convenient computer software of your choice.
TASK 1 (4%)
Display Hong Kong as a rectangle on a 2D graph, with x- and y-axes representing longitude
and latitude, respectively.
Show on the same graph (i) the boundaries of the 18 districts, and (ii) the 34 points in S .
TASK 2 (8%)
Show that the proposed Cox point process has a parametric intensity given by
ρ(s|β) = eβV (s), s ∈ S ,
and a parametric conditional intensity given by
λ(s,S|α, β) = e
βV (s)(α + #S)
α +
∫
S
eβV (s)ds
, s ∈ S , S ∈ E (exponential space).
[Hint: derive first the density f(S) of the Cox point process, then use the fact that the conditional
intensity is given by f({s} ∪ S)/f(S).]
With the help of a numerical optimisation package, calculate the maximum conditional
pseudo-likelihood estimates (α?Cox, β?Cox) of (α, β).
Based on the above results, estimate the expected number of incidents occurring in each of
the 18 districts.
TASK 3 (8%)
Assuming a Poisson point process model for S with intensity ρ(s|β) = eβV (s), calculate the
maximum likelihood estimate β?Poi of β.
Generate 10000 parametric bootstrap samples from the Poisson point process model with
intensity ρ(·|β?Poi).
Based on each bootstrap sample, calculate the maximum conditional pseudo-likelihood
estimate α Cox of α, following the same procedure as described in TASK 2. Plot the
empirical distribution function of the 10000 replicates of α Cox thus obtained.
2
For α > 0, the Cox point process has a reweighted pair correlation function equal to a constant
1+1/α > 1, suggesting a tendency of “clusterings” of the violent incidents. As α→∞, the reweighted
pair correlation function converges to 1 and Wα converges to a non-random constant 1. In the limiting
case α =∞, the Cox point process reduces to a Poisson point process with intensity ρ(s|β), and the
violent incidents become independently located.
TASK 4 (5%)
With the help of the parametric bootstrap distribution found in TASK 3, suggest a
procedure for testing the null hypothesis (H0) that the violent incidents are independently
located, against the alternative (H1) that the incidents have a clustering tendency.
Report a p-value for the test.
[Hint: You may consider using α?Cox as a test statistic, and explain why it is an appropriate choice.]
Points to note
In the main text of your report, show and explain your steps, and display formulae in their
conventional mathematical form. Do not explain anything using computer code.
Attach your computer code to your report as an appendix. Include brief comments on lines
which involve complicated operations.
********** END OF MINI-PROJECT **********