首页 >
> 详细

CE634 Assignment-2

Derive Community Structures from Taxi Flow Network

1. Introduction

In assignment 1, you have performed an exploratory data analysis to derive meaningful statistics of the taxi

GPS dataset. Some of the questions listed in assignment 1 allow us to generate insights into the spatial or

temporal distribution of travel demand in NYC. However, none of the questions require us to couple trip

origins and destinations together. In other words, the interactions among different locations in the study

area remain unexplored.

In this assignment, you will be asked to perform a network analysis, namely community detection, to

uncover the hidden structures in the flow network derived from the taxi GPS dataset. According to

Wikipedia:

“In the study of complex networks, a network is said to have community structure if the nodes of the

network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes

is densely connected internally. In the particular case of non-overlapping community finding, this

implies that the network divides naturally into groups of nodes with dense connections internally and

sparser connections between groups. But overlapping communities are also allowed. The more

general definition is based on the principle that pairs of nodes are more likely to be connected if they

are both members of the same community(ies), and less likely to be connected if they do not share

communities. A related but different problem is community search, here the goal is to find a

community that a certain vertex belongs to.”

Thus, the community detection algorithm(s) can be applied to our taxi GPS dataset to derive community

structures such that the taxi flows (e.g., origin-destination trips) within the communities are denser while

inter-community flows are sparser. The results could generate insights into the spatial interactions among

different locations in the city.

2. About Community Detection

2.1. Derive taxi flow network from the GPS dataset

The taxi GPS dataset makes it possible for us to derive the origin-destination (OD) trips that contain rich

information of the location interactions in the Manhattan area. Usually, a community detection algorithm

is performed over a network, with the nodes representing particular entities, and the links (and weights)

denoting the interactions among these nodes.

In this assignment, you will first derive a flow network, in which the nodes are represented by various

locations (or places) in the study area, where the links between the nodes are measured as the total amount

of taxi trips between the corresponding locations.

Instead of representing the nodes using road intersections, we use taxi zones in this analysis to represent

different places (e.g., nodes) in the flow network. The reason is that using taxi zones will significantly

reduce the number of nodes and edges in the network, which makes the computation time (of the community

detection) more reasonable.

To accomplish this task, you are provided with another file:

− intersection_to_zone

This file maps each road intersection onto a particular taxi zone in Manhattan. Two columns, namely

inter_id and zone_id, denote the id of the road intersection and the taxi zone, respectively.

What you need to do is to analyze the original taxi GPS dataset, derive the OD trips at the level of road

intersections, and transform the network onto taxi zones. To make it clear, the final network used for the

community detection consists of 63 nodes that denote the taxi zones in Manhattan, with the weight of the

edges as the amount of taxi trips between the zones.

2.2. Perform the community detection algorithm

There are many community detection algorithms (https://en.wikipedia.org/wiki/Community_structure),

with their own pros and cons. In this assignment, you are asked to apply the algorithm proposed by Blondel

et al., (2008). The implementation is described in details in this article.

3

However, to facilitate the analysis, you are encouraged to use existing libraries & APIs. In particular, igraph

(http://igraph.org/), which is a collection of network analysis tools, allows you to perform this algorithm

through a couple of programming languages such as Python, R, and C. The python API for this algorithm

can be found at http://igraph.org/python/doc/igraph.Graph-class.html#community_multilevel.

A few things for your attention:

• You first have to figure out how to install the package (http://igraph.org/python/);

• And then follow the tutorial and learn how to establish a graph, i.e., network

(http://igraph.org/python/doc/tutorial/tutorial.html);

• Then, apply the so called multilevel community detection algorithm proposed by Blondel et al.

3. Tasks

(1) Derive the flow network at the level of taxi zones.

(2) Understand the multilevel community detection algorithm and perform it over the flow network. The

output of your analysis would be a collection of clusters or communities, with each community including

a list of taxi zones with frequent interactions.

(3) You are asked to form the flow network during the following time periods, and derive the corresponding

communities:

• Using taxi trips occurred during 07:00 – 09:00 throughout the whole year

• And those during 16:00 – 18:00 throughout the whole year

• Using taxi trips of each of the twelve months.

4. What to Submit

− A word document or pdf file with 1-2 paragraphs of your understanding of the multilevel

community detection algorithm and the key concepts (e.g., modularity).

− The results of the communities based on (3), i.e., results for 14 different scenarios.

− The submission due date is November 25th, 2019.

References

Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in

large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- Stat 429作业代做、代写mathematics课程作业、代做r编程语言 2019-11-18
- 代做431 Quiz 2作业、R编程设计作业调试、R语言作业代写、代做dat 2019-11-18
- 代写mt5761留学生作业、代做statistical Modelling作 2019-11-18
- Stata留学生作业代写、代做domco2opop 语言作业、Stata编程 2019-11-17
- 代做mast 397B作业、代写r语言留学生作业、代做sas语言作业、代写s 2019-11-17
- 代做comp25216作业、代写network Analysis留学生作业、 2019-11-17
- 代做comp 2406作业、代写json留学生作业、代做html/Mongo 2019-11-17
- 代写comp 250作业、代做java编程设计作业、代写kdtree课程作业 2019-11-17
- 代做vizdoom Game Engine作业、代写 Fps留学生作业、代做 2019-11-17
- Server作业代写、代tcp Connection编程作业、代写ip/Tc 2019-11-17
- 代写ict638留学生作业、代写mobile App作业、代做 .Net语言 2019-11-17
- 代写nodes And Edges作业、代做java程序语言作业、代写pyt 2019-11-16
- Data/Prod作业代做、代写python编程设计作业、代做java/C+ 2019-11-16
- Data Programming作业代写、代做canopy留学生作业、Pyt 2019-11-16
- 代做data Outputs作业、Csv Format作业代写、R程序语言作 2019-11-16
- 代做virtual Disk作业、C/C++程序语言作业调试、C/C++实验 2019-11-16
- 代写comp222作业、代做ca Assignment作业、代做python 2019-11-16
- Cmpt 300作业代做、代写operating Systems作业、C/C 2019-11-16
- Data Visualisation And Analytics Assi... 2019-11-15
- Block Breaker Assignment Game Engine ... 2019-11-15