Query Complexity Based Optimal Processing of Raw Data

05/12/2022
by   Mayank Patel, et al.
0

The paper aims to find an efficient way for processing large datasets having different types of workload queries with minimal replication. The work first identifies the complexity of queries best suited for the given data processing tool . The paper proposes Query Complexity Aware partitioning technique QCA with a lightweight query identification and partitioning algorithm. Different replication approaches have been studied to cover more use-cases for different application workloads. The technique is demonstrated using a scientific dataset known as Sloan Digital Sky Survey SDSS. The results show workload execution time WET reduced by 94.6 compared to the original dataset. The QCA technique also reduced multi-node replication by 5.8x times compared to state-of-the-art workload aware WA techniques. The multi-node and multi-core execution of workload using QCA proposed partitions reduced WET by 42.66

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2021

Load Balanced Semantic Aware Distributed RDF Graph

The modern day semantic applications store data as Resource Description ...
research
12/21/2022

Resource Utilization Monitoring for Raw Data Query Processing

Scientific experiments, simulations, and modern applications generate la...
research
11/17/2017

Loom: Query-aware Partitioning of Online Graphs

As with general graph processing systems, partitioning data over a clust...
research
05/10/2021

Skew-Oblivious Data Routing for Data-Intensive Applications on FPGAs with HLS

FPGAs have become emerging computing infrastructures for accelerating ap...
research
02/27/2020

SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial Data

The proliferation of GPS-enabled devices has led to the development of n...
research
02/03/2022

QueryER: A Framework for Fast Analysis-Aware Deduplication over Dirty Data

In this work, we explore the problem of correctly and efficiently answer...
research
03/16/2018

Distributed Caching for Complex Querying of Raw Arrays

As applications continue to generate multi-dimensional data at exponenti...

Please sign up or login with your details

Forgot password? Click here to reset