A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

03/31/2018
by   Ashish Tapdiya, et al.
0

Hadoop is emerging as the primary data hub in enterprises, and SQL represents the de facto language for data analysis. This combination has led to the development of a variety of SQL-on-Hadoop systems in use today. While the various SQL-on-Hadoop systems target the same class of analytical workloads, their different architectures, design decisions and implementations impact query performance. In this work, we perform a comparative analysis of four state-of-the-art SQL-on-Hadoop systems (Impala, Drill, Spark SQL and Phoenix) using the Web Data Analytics micro benchmark and the TPC-H benchmark on the Amazon EC2 cloud platform. The TPC-H experiment results show that, although Impala outperforms other systems (4.41x - 6.65x) in the text format, trade-offs exists in the parquet format, with each system performing best on subsets of queries. A comprehensive analysis of execution profiles expands upon the performance results to provide insights into performance variations, performance bottlenecks and query execution characteristics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2022

Serving Hybrid-Cloud SQL Interactive Queries at Twitter

The demand for data analytics has been consistently increasing in the pa...
research
07/26/2021

COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics

Data analysis often involves comparing subsets of data across many dimen...
research
04/26/2021

Evaluating Query Languages and Systems for High-Energy Physics Data

In the domain of high-energy physics (HEP), query languages in general a...
research
04/13/2023

SIGNAL – The SAP Signavio Analytics Query Language

This paper provides an introduction to and discussion of SIGNAL, an indu...
research
08/24/2023

Lightweight Materialization for Fast Dashboards Over Joins

Dashboards are vital in modern business intelligence tools, providing no...
research
12/16/2021

Predictive Price-Performance Optimization for Serverless Query Processing

We present an efficient, parametric modeling framework for predictive re...
research
09/14/2022

SQL and NoSQL Databases Software architectures performance analysis and assessments – A Systematic Literature review

Context: The efficient processing of Big Data is a challenging task for ...

Please sign up or login with your details

Forgot password? Click here to reset