首页 >
> 详细

CE634 Assignment 1

Preprocessing and Exploratory Data Analysis of Large-Scale Taxi GPS Traces

1. Introduction

In the two assignments of this subject, you will be dealing with a large-scale taxi GPS dataset. The dataset

records millions of taxi trips in Manhattan, New York in a given year. This dataset has been used extensively

to study the dynamics of the urban taxi flow. For example, it has been used by a group of researchers at

MIT to evaluate the ride sharing potential of the city (Santi et al., 2014) or to estimate the minimum taxi

fleet that is able to serve all the travel demand in the city (Vazifeh et al., 2018).

In this assignment, you will be asked to preprocess the dataset, play with it, and derive meaningful statistics

through exploratory data analysis. To start, you are provided with the following two files:

− taxi_id.csv.bz2

− intersections.csv

The first compressed file (taxi_id.csv.bz2) records the origin and destination of the taxi trips along with the

timestamps. For simplicity, the origin and destination of the actual trips have been matched to the nearest

road intersections. The format of this file is as follows:

taxi_id, pick_up_time, drop_off_time, pick_up_intersection, drop_off_intersection

The taxi_id is a numerical value that uniquely identifies each taxi. pick_up_time and drop_off_time are

expressed in Unix epoch time, and pick_up_intersection, drop_off_intersection are the indices of the

intersections (numbers from 1 to 4091).

The second file (intersection.csv) represents the street intersections to which pick-up and drop-off points

were snapped to. The format of the file is:

id, latitude, longitude

where id is a progressive identifier from 1 to 4091 and latitude and longitude are the GPS coordinates of

the intersection. Below are two screenshots of these road intersections:

1

2. Tasks

In this section, you will be asked to analyze the dataset − using any software or programming language that

you prefer − and then provide answers to the following research questions:

(1) How many unique taxis are there in this dataset, and how many trips are recorded?

(2) What is the distribution of the number of trips per taxi? Who are the top performers?

(3) How does the daily trip count (i.e., number of trips per day) change throughout the year? Any rhythm

or seasonality?

(4) What is the distribution of the number of departure trips at different locations (i.e., intersections)? What

about the distribution of arrival trips? What will you conclude from these two distributions?

(5) How does the number of trips change over time in a day? (You will be given three dates randomly

selected from the dataset, and then plot the hourly variation of trips from the perspective of local time).

(6) What is the probability distribution of the trip distance (measured as straight-line distance)? How about

travel time (i.e., trip duration)? What will you conclude from these two distributions?

For question (2) – (6), you are required to provide figures along with your answers. Note that some of the

above questions are open ended, and the answers could vary among students.

2

3. What to submit

− A word document or pdf file with answers to (1) – (6) − The computer code used in this assignment. If particular software is used, please elaborate the

procedures on how it helps derive the answers.

− The submission due date is November 11th, 2019.

4. Access to the dataset

The dataset used in this assignment can be download via the following link:

https://polyuitmy.sharepoint.com/:f:/g/personal/yangxu_polyu_edu_hk/EtCz10QsyxZMhY8_Z3bF9xYBmhb_2CLZK9G6QtlgO0jtg?e=YePkJ0

Please contact the subject instructor if the link is invalid.

Reference.

Santi, P., Resta, G., Szell, M., Sobolevsky, S., Strogatz, S. H., & Ratti, C. (2014). Quantifying the benefits

of vehicle pooling with shareability networks. Proceedings of the National Academy of Sciences, 111(37),

13290-13294.

Vazifeh, M. M., Santi, P., Resta, G., Strogatz, S. H., & Ratti, C. (2018). Addressing the minimum fleet

problem in on-demand urban mobility. Nature, 557(7706), 534.

3

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 代写cs3014 Google Analytics Customer Rev 2020-01-21
- 代写cmpsc121 Structs代写留学生c/C++实验... 2020-01-21
- 代写mis6326 Data Management调试存储过程作业、数据库编 2020-01-21
- 代写msci 581作业、代做marketing Analytics作业、P 2020-01-20
- Software课程作业代做、代写java，C/C++程序设计作业、Pyth 2020-01-20
- Tcss 372作业代做、代写python，Java编程语言作业、代做c/C 2020-01-20
- Emergency Facilities作业代写、代写r编程设计作业、R课程 2020-01-18
- Cis 413/513作业代做、代写data Structures作业、Ja 2020-01-18
- 代写ia626留学生作业、Python程序设计作业调试、代做data课程作业 2020-01-18
- Mat00027i作业代写、Java程序语言作业调试、Mathematica 2020-01-17
- 代做kt Model作业、代写java，Python编程设计作业、代做c/C 2020-01-17
- Data Set课程作业代做、代写r程序语言作业、Ltcret留学生作业代做 2020-01-17
- 代写rstudio留学生作业、代做r编程设计作业、代写r课程设计作业代做数据 2020-01-17
- 代写cs2250 Delimiter Matching代做数据结... 2020-01-16
- 代写cs12b Edit Distance帮写java实验作业... 2020-01-16
- 代写mins325 Filereader And Filewriter代... 2020-01-16
- 代写cosi131 Tunnels帮写java实验作业 2020-01-16
- 代写inm312 Balancebit Software代写留学... 2020-01-16
- 代写cs61b Maze Solver代写java课程设计 2020-01-16
- Program留学生作业代做、C/C++编程语言作业代写、代做java，Py 2020-01-14