首页 >
> 详细

CE634 Assignment 1

Preprocessing and Exploratory Data Analysis of Large-Scale Taxi GPS Traces

1. Introduction

In the two assignments of this subject, you will be dealing with a large-scale taxi GPS dataset. The dataset

records millions of taxi trips in Manhattan, New York in a given year. This dataset has been used extensively

to study the dynamics of the urban taxi flow. For example, it has been used by a group of researchers at

MIT to evaluate the ride sharing potential of the city (Santi et al., 2014) or to estimate the minimum taxi

fleet that is able to serve all the travel demand in the city (Vazifeh et al., 2018).

In this assignment, you will be asked to preprocess the dataset, play with it, and derive meaningful statistics

through exploratory data analysis. To start, you are provided with the following two files:

− taxi_id.csv.bz2

− intersections.csv

The first compressed file (taxi_id.csv.bz2) records the origin and destination of the taxi trips along with the

timestamps. For simplicity, the origin and destination of the actual trips have been matched to the nearest

road intersections. The format of this file is as follows:

taxi_id, pick_up_time, drop_off_time, pick_up_intersection, drop_off_intersection

The taxi_id is a numerical value that uniquely identifies each taxi. pick_up_time and drop_off_time are

expressed in Unix epoch time, and pick_up_intersection, drop_off_intersection are the indices of the

intersections (numbers from 1 to 4091).

The second file (intersection.csv) represents the street intersections to which pick-up and drop-off points

were snapped to. The format of the file is:

id, latitude, longitude

where id is a progressive identifier from 1 to 4091 and latitude and longitude are the GPS coordinates of

the intersection. Below are two screenshots of these road intersections:

1

2. Tasks

In this section, you will be asked to analyze the dataset − using any software or programming language that

you prefer − and then provide answers to the following research questions:

(1) How many unique taxis are there in this dataset, and how many trips are recorded?

(2) What is the distribution of the number of trips per taxi? Who are the top performers?

(3) How does the daily trip count (i.e., number of trips per day) change throughout the year? Any rhythm

or seasonality?

(4) What is the distribution of the number of departure trips at different locations (i.e., intersections)? What

about the distribution of arrival trips? What will you conclude from these two distributions?

(5) How does the number of trips change over time in a day? (You will be given three dates randomly

selected from the dataset, and then plot the hourly variation of trips from the perspective of local time).

(6) What is the probability distribution of the trip distance (measured as straight-line distance)? How about

travel time (i.e., trip duration)? What will you conclude from these two distributions?

For question (2) – (6), you are required to provide figures along with your answers. Note that some of the

above questions are open ended, and the answers could vary among students.

2

3. What to submit

− A word document or pdf file with answers to (1) – (6) − The computer code used in this assignment. If particular software is used, please elaborate the

procedures on how it helps derive the answers.

− The submission due date is November 11th, 2019.

4. Access to the dataset

The dataset used in this assignment can be download via the following link:

https://polyuitmy.sharepoint.com/:f:/g/personal/yangxu_polyu_edu_hk/EtCz10QsyxZMhY8_Z3bF9xYBmhb_2CLZK9G6QtlgO0jtg?e=YePkJ0

Please contact the subject instructor if the link is invalid.

Reference.

Santi, P., Resta, G., Szell, M., Sobolevsky, S., Strogatz, S. H., & Ratti, C. (2014). Quantifying the benefits

of vehicle pooling with shareability networks. Proceedings of the National Academy of Sciences, 111(37),

13290-13294.

Vazifeh, M. M., Santi, P., Resta, G., Strogatz, S. H., & Ratti, C. (2018). Addressing the minimum fleet

problem in on-demand urban mobility. Nature, 557(7706), 534.

3

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20
- General Specifications Use Python In T... 2020-06-20
- Comp-206 Mini Assignment 6 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Mech 203 – End-Of-Semester Project 2020-06-20
- Ms980 Business Analytics 2020-06-20
- Cs952 Database And Web Systems Develop... 2020-06-20
- Homework 4 Using Data From The China H... 2020-06-20
- Assignment 1 Build A Shopping Cart 2020-06-20