COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics

07/26/2021
by   Tarique Siddiqui, et al.
0

Data analysis often involves comparing subsets of data across many dimensions for finding unusual trends and patterns. While the comparison between subsets of data can be expressed using SQL, they tend to be complex to write, and suffer from poor performance over large and high-dimensional datasets. In this paper, we propose a new logical operator COMPARE for relational databases that concisely captures the enumeration and comparison between subsets of data and greatly simplifies the expressing of a large class of comparative queries. We extend the database engine with optimization techniques that exploit the semantics of COMPARE to significantly improve the performance of such queries. We have implemented these extensions inside Microsoft SQL Server, a commercial DBMS engine. Our extensive evaluation on synthetic and real-world datasets shows that COMPARE results in a significant speedup over existing approaches, including physical plans generated by today's database systems, user-defined function (UDF), as well as middleware solutions that compare subsets outside the databases.

READ FULL TEXT
research
03/31/2018

A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

Hadoop is emerging as the primary data hub in enterprises, and SQL repre...
research
02/18/2021

A Unified System for Data Analytics and In Situ Query Processing

In today's world data is being generated at a high rate due to which it ...
research
09/20/2017

Empowering In-Memory Relational Database Engines with Native Graph Processing

The plethora of graphs and relational data give rise to many interesting...
research
05/12/2023

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

Relational databases play an important role in this Big Data era. Howeve...
research
12/19/2017

Cognitive Database: A Step towards Endowing Relational Databases with Artificial Intelligence Capabilities

We propose Cognitive Databases, an approach for transparently enabling A...
research
03/23/2017

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

The need for modern data analytics to combine relational, procedural, an...
research
10/04/2021

Prolog as a Querying Language for MongoDB

Today's database systems have shown to be capable of supporting AI appli...

Please sign up or login with your details

Forgot password? Click here to reset