R*-Grove: Balanced Spatial Partitioning for Large-scale Datasets

07/22/2020
by   Tin Vu, et al.
0

The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research.

READ FULL TEXT
research
09/12/2023

Enhancing In-Memory Spatial Indexing with Learned Search

Spatial data is ubiquitous. Massive amounts of data are generated every ...
research
06/08/2023

Learned spatial data partitioning

Due to the significant increase in the size of spatial data, it is essen...
research
08/24/2020

Approximate Partition Selection for Big-Data Workloads using Summary Statistics

Many big-data clusters store data in large partitions that support acces...
research
08/24/2020

The Case for Learned Spatial Indexes

Spatial data is ubiquitous. Massive amounts of data are generated every ...
research
01/03/2022

Clustering-based Partitioning for Large Web Graphs

Graph partitioning plays a vital role in distributedlarge-scale web grap...
research
07/18/2023

Two-layer Space-oriented Partitioning for Non-point Data

Non-point spatial objects (e.g., polygons, linestrings, etc.) are ubiqui...
research
03/29/2021

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Sample-based approximate query processing (AQP) suffers from many pitfal...

Please sign up or login with your details

Forgot password? Click here to reset