The Case for Learned Spatial Indexes

by   Varun Pandey, et al.

Spatial data is ubiquitous. Massive amounts of data are generated every day from billions of GPS-enabled devices such as cell phones, cars, sensors, and various consumer-based applications such as Uber, Tinder, location-tagged posts in Facebook, Twitter, Instagram, etc. This exponential growth in spatial data has led the research community to focus on building systems and applications that can process spatial data efficiently. In the meantime, recent research has introduced learned index structures. In this work, we use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) and apply them to five classical multi-dimensional indexes to be able to answer spatial range queries. By tuning each partitioning technique for optimal performance, we show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension, (ii) the bottleneck for tree structures is index lookup, which could potentially be improved by linearizing the indexed partitions (iii) filtering on one dimension and refining using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions, and (iv) learned indexes can have a significant impact on the performance of low selectivity queries while being less effective under higher selectivities.



There are no comments yet.


page 1

page 2

page 3

page 4


Spatial Interpolation-based Learned Index for Range and kNN Queries

A corpus of recent work has revealed that the learned index can improve ...

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

Filtering data based on predicates is one of the most fundamental operat...

Learning Multi-dimensional Indexes

Scanning and filtering over multi-dimensional tables are key operations ...

R*-Grove: Balanced Spatial Partitioning for Large-scale Datasets

The rapid growth of big spatial data urged the research community to dev...

Hands-off Model Integration in Spatial Index Structures

Spatial indexes are crucial for the analysis of the increasing amounts o...

GPU Accelerated Similarity Self-Join for Multi-Dimensional Data

The self-join finds all objects in a dataset that are within a search di...

Bridging the Gap Between Theory and Practice on Insertion-Intensive Database

With the prevalence of online platforms, today, data is being generated ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.