A Big Data Driven Framework for Duplicate Device Detection from Multi-sourced Mobile Device Location Data

by   Aliakbar Kabiri, et al.

Mobile Device Location Data (MDLD) has been popularly utilized in various fields. Yet its large-scale applications are limited because of either biased or insufficient spatial coverage of the data from individual data vendors. One approach to improve the data coverage is to leverage the data from multiple data vendors and integrate them to build a more representative dataset. For data integration, further treatments on the multi-sourced dataset are required due to several reasons. First, the possibility of carrying more than one device could result in duplicated observations from the same data subject. Additionally, when utilizing multiple data sources, the same device might be captured by more than one data provider. Our paper proposes a data integration methodology for multi-sourced data to investigate the feasibility of integrating data from several sources without introducing additional biases to the data. By leveraging the uniqueness of travel pattern of each device, duplicate devices are identified. The proposed methodology is shown to be cost-effective while it achieves the desired accuracy level. Our findings suggest that devices sharing the same imputed home location and the top five most-visited locations during a month can represent the same user in the MDLD. It is shown that more than 99.6 aforementioned attribute in common are observed at the same location simultaneously. Finally, the proposed algorithm has been successfully applied to the national-level MDLD of 2020 to produce the national passenger origin-destination data for the NextGeneration National Household Travel Survey (NextGen NHTS) program.


page 6

page 8


A Data-Driven Analytical Framework of Estimating Multimodal Travel Demand Patterns using Mobile Device Location Data

While benefiting people's daily life in so many ways, smartphones and th...

A Data-Driven Travel Mode Share Estimation Framework based on Mobile Device Location Data

Mobile device location data (MDLD) contains abundant travel behavior inf...

Survey data integration for regression analysis using model calibration

We consider regression analysis in the context of data integration. To c...

Inferring Networked Device Categories from Low-Level Activity Indicators

We study the problem of inferring the type of a networked device in a ho...

Exploring a New Model for Mobile Positioning Based on CDR Data of The Cellular Networks

The emerging technologies related to mobile data especially CDR data has...

Popularity Driven Data Integration

More and more, with the growing focus on large scale analytics, we are c...

Please sign up or login with your details

Forgot password? Click here to reset