Understanding the Challenges and Assisting Developers with Developing Spark Applications

03/25/2021
by   Zehao Wang, et al.
0

To process data more efficiently, big data frameworks provide data abstractions to developers. However, due to the abstraction, there may be many challenges for developers to understand and debug the data processing code. To uncover the challenges in using big data frameworks, we first conduct an empirical study on 1,000 Apache Spark-related questions on Stack Overflow. We find that most of the challenges are related to data transformation and API usage. To solve these challenges, we design an approach, which assists developers with understanding and debugging data processing in Spark. Our approach leverages statistical sampling to minimize performance overhead, and provides intermediate information and hint messages for each data processing step of a chained method pipeline. The preliminary evaluation of our approach shows that it has low performance overhead and we receive good feedback from developers.

READ FULL TEXT

page 1

page 2

page 3

research
07/06/2017

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Hadoop and Spark are widely used distributed processing frameworks for l...
research
12/28/2022

Does Big Data Require Complex Systems? A Performance Comparison Between Spark and Unicage Shell Scripts

The paradigm of big data is characterized by the need to collect and pro...
research
02/20/2023

Profiling and Optimizing Java Streams

The Stream API was added in Java 8 to allow the declarative expression o...
research
10/23/2017

Communication Efficient Checking of Big Data Operations

We propose fast probabilistic algorithms with low (i.e., sublinear in th...
research
01/16/2018

Debugging Framework Applications: Benefits and Challenges

Aspects of frameworks, such as inversion of control and the structure of...
research
12/06/2018

K-Pg: Shared State in Differential Dataflows

Many of the most popular scalable data-processing frameworks are fundame...
research
05/05/2023

Hearing the voice of experts: Unveiling Stack Exchange communities' knowledge of test smells

Refactorings are transformations to improve the code design without chan...

Please sign up or login with your details

Forgot password? Click here to reset