On The Equivalence of Tries and Dendrograms - Efficient Hierarchical Clustering of Traffic Data

10/12/2018
by   Chia-Tung Kuo, et al.
0

The widespread use of GPS-enabled devices generates voluminous and continuous amounts of traffic data but analyzing such data for interpretable and actionable insights poses challenges. A hierarchical clustering of the trips has many uses such as discovering shortest paths, common routes and often traversed areas. However, hierarchical clustering typically has time complexity of O(n^2 n) where n is the number of instances, and is difficult to scale to large data sets associated with GPS data. Furthermore, incremental hierarchical clustering is still a developing area. Prefix trees (also called tries) can be efficiently constructed and updated in linear time (in n). We show how a specially constructed trie can compactly store the trips and further show this trie is equivalent to a dendrogram that would have been built by classic agglomerative hierarchical algorithms using a specific distance metric. This allows creating hierarchical clusterings of GPS trip data and updating this hierarchy in linear time. interpret the structure as clusterings of differing granularity as one progresses down the tree. We demonstrate the usefulness of our proposed approach on a real world data set of half a million taxis' GPS traces, well beyond the capabilities of agglomerative clustering methods. Our work is not limited to trip data and can be used with other data with a string representation.

READ FULL TEXT
research
06/11/2011

Fast, Linear Time Hierarchical Clustering using the Baire Metric

The Baire metric induces an ultrametric on a dataset and is of linear co...
research
05/25/2021

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Hierarchical Agglomerative Clustering (HAC) is one of the oldest but sti...
research
11/03/2022

Convex Clustering through MM: An Efficient Algorithm to Perform Hierarchical Clustering

Convex clustering is a modern method with both hierarchical and k-means ...
research
07/13/2018

Learning Graph Representations by Dendrograms

Hierarchical graph clustering is a common technique to reveal the multi-...
research
02/27/2019

Linear Time Visualization and Search in Big Data using Pixellated Factor Space Mapping

It is demonstrated how linear computational time and storage efficient a...
research
09/05/2023

Data Aggregation for Hierarchical Clustering

Hierarchical Agglomerative Clustering (HAC) is likely the earliest and m...

Please sign up or login with your details

Forgot password? Click here to reset