Log In Sign Up

Top-k queries over digital traces

by   Yifan Li, et al.

Recent advances in social and mobile technology have enabled an abundance of digital traces (in the form of mobile check-ins, association of mobile devices to specific WiFi hotspots, etc.) revealing the physical presence history of diverse sets of entities (e.g., humans, devices, and vehicles). One challenging yet important task is to identify k entities that are most closely associated with a given query entity based on their digital traces. We propose a suite of indexing techniques and algorithms to enable fast query processing for this problem at scale. We first define a generic family of functions measuring the association between entities, and then propose algorithms to transform digital traces into a lower-dimensional space for more efficient computation. We subsequently design a hierarchical indexing structure to organize entities in a way that closely associated entities tend to appear together. We then develop algorithms to process top-k queries utilizing the index. We theoretically analyze the pruning effectiveness of the proposed methods based on a mobility model which we propose and validate in real life situations. Finally, we conduct extensive experiments on both synthetic and real datasets at scale, evaluating the performance of our techniques both analytically and experimentally, confirming the effectiveness and superiority of our approach over other applicable approaches across a variety of parameter settings and datasets.


page 1

page 2

page 3

page 4


LES3: Learning-based Exact Set Similarity Search

Set similarity search is a problem of central interest to a wide variety...

Em-K Indexing for Approximate Query Matching in Large-scale ER

Accurate and efficient entity resolution (ER) is a significant challenge...

Searching Heterogeneous Personal Digital Traces

Digital traces of our lives are now constantly produced by various conne...

Representation Learning Models for Entity Search

We focus on the problem of learning distributed representations for enti...

A Pluggable Learned Index Method via Sampling and Gap Insertion

Database indexes facilitate data retrieval and benefit broad application...

An Experimental Analysis of Indoor Spatial Queries: Modeling, Indexing, and Processing

Indoor location-based services (LBS), such as POI search and routing, ar...

A Total Error Framework for Digital Traces of Humans

The interactions and activities of hundreds of millions of people worldw...