In-memory Distributed Spatial Query Processing and Optimization

07/08/2019
by   Mingjie Tang, et al.
0

Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew, which is common in practice, and optimize communication costs accordingly. We propose a distributed query scheduler that use a new cost model to optimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler employs new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. The experimental study is based on real datasets and demonstrates that distributed spatial query processing can be enhanced by up to an order of magnitude over existing in-memory and distributed spatial systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2019

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

Due to the ubiquity of spatial data applications and the large amounts o...
research
05/22/2018

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

In modern large-scale distributed systems, analytics jobs submitted by v...
research
02/04/2020

Providing Insights for Queries affected by Failures and Stragglers

Interactive time responses are a crucial requirement for users analyzing...
research
02/02/2018

Measuring Spark on AWS: A Case Study on Mining Scientific Publications with Annotation Query

Annotation Query (AQ) is a program that provides the ability to query ma...
research
10/20/2017

STREAK: An Efficient Engine for Processing Top-k SPARQL Queries with Spatial Filters

The importance of geo-spatial data in critical applications such as emer...
research
06/29/2020

Hands-off Model Integration in Spatial Index Structures

Spatial indexes are crucial for the analysis of the increasing amounts o...
research
04/07/2020

GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams

Apache Flink is an open-source system for scalable processing of batch a...

Please sign up or login with your details

Forgot password? Click here to reset