A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing

07/24/2021
by   Bingbing Rao, et al.
0

We are living in the era of Big Data and witnessing the explosion of data. Given that the limitation of CPU and I/O in a single computer, the mainstream approach to scalability is to distribute computations among a large number of processing nodes in a cluster or cloud. This paradigm gives rise to the term of data-intensive computing, which denotes a data parallel approach to process massive volume of data. Through the efforts of different disciplines, several promising programming models and a few platforms have been proposed for data-intensive computing, such as MapReduce, Hadoop, Apache Spark and Dyrad. Even though a large body of research work has being proposed to improve overall performance of these platforms, there is still a gap between the actual performance demand and the capability of current commodity systems. This paper is aimed to provide a comprehensive understanding about current semantics-aware approaches to improve the performance of data-intensive computing. We first introduce common characteristics and paradigm shifts in the evolution of data-intensive computing, as well as contemporary programming models and technologies. We then propose four kinds of performance defects and survey the state-of-the-art semantics-aware techniques. Finally, we discuss the research challenges and opportunities in the field of semantics-aware performance optimization for data-intensive computing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2021

SODA: A Semantics-Aware Optimization Framework for Data-Intensive Applications Using Hybrid Program Analysis

In the era of data explosion, a growing number of data-intensive computi...
research
10/01/2019

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

This survey article reviews the challenges associated with deploying and...
research
03/21/2022

A Model and Survey of Distributed Data-Intensive Systems

Data is a precious resource in today's society, and is generated at an u...
research
04/09/2020

Big Computing: Where are we heading?

This paper presents the overview of the current trends of Big data again...
research
08/08/2020

BSF: a parallel computation model for scalability estimation of iterative numerical algorithms on cluster computing systems

This paper examines a new parallel computation model called bulk synchro...
research
07/20/2023

Approximate Computing Survey, Part I: Terminology and Software Hardware Approximation Techniques

The rapid growth of demanding applications in domains applying multimedi...
research
12/23/2019

Parallel Computing With R: A Brief Review

Parallel computing has established itself as another standard method for...

Please sign up or login with your details

Forgot password? Click here to reset