Semantic Exploration of Traffic Dynamics
Given a large collection of urban datasets, how can we find their hidden correlations? For example, New York City (NYC) provides open access to taxi data from year 2012 to 2015 with about half million taxi trips generated per day. In the meantime, we have a rich set of urban data in NYC including points-of-interest (POIs), geo-tagged tweets, weather, vehicle collisions, etc. Is it possible that these ubiquitous datasets can be used to explain the city traffic? Understanding the hidden correlation between external data and traffic data would allow us to answer many important questions in urban computing such as: If we observe a high traffic volume at Madison Square Garden (MSG) in NYC, is it because of the regular peak hour or a big event being held at MSG? If a disaster weather such as a hurricane or a snow storm hits the city, how would the traffic be affected? While existing studies may utilize external datasets for prediction task, they do not explicitly seek for direct explanations from the external datasets. In this paper, we present our results in attempts to understand taxi traffic dynamics in NYC from multiple external data sources. We use four real-world ubiquitous urban datasets, including POI, weather, geo-tagged tweet, and collision records. To address the heterogeneity of ubiquitous urban data, we present carefully-designed feature representations for various datasets. Extensive experiments on real data demonstrate the explanatory power on taxi traffic by using external datasets. More specifically, our analysis suggests that POIs can well describe the regular traffic patterns. At the same time, geo-tagged tweets can explain irregular traffic caused by big events and weather can explain the abnormal traffic drop.
READ FULL TEXT