TRANSMUT-SPARK: Transformation Mutation for Apache Spark

We propose TRANSMUT-Spark, a tool that automates the mutation testing process of Big Data processing code within Spark programs. Apache Spark is an engine for Big Data Processing. It hides the complexity inherent to Big Data parallel and distributed programming and processing through built-in functions, underlying parallel processes, and data management strategies. Nonetheless, programmers must cleverly combine these functions within programs and guide the engine to use the right data management strategies to exploit the large number of computational resources required by Big Data processing and avoid substantial production losses. Many programming details in data processing code within Spark programs are prone to false statements that need to be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault-based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces the TRANSMUT-Spark solution for testing Spark programs. TRANSMUT-Spark automates the most laborious steps of the process and fully executes the mutation testing process. The paper describes how the tool automates the mutants generation, test execution, and adequacy analysis phases of mutation testing with TRANSMUT-Spark. It also discusses the results of experiments that were carried out to validate the tool to argue its scope and limitations.

READ FULL TEXT

page 20

page 38

page 39

research
08/05/2021

An Abstract View of Big Data Processing Programs

This paper proposes a model for specifying data flow based parallel data...
research
08/05/2019

Mull it over: mutation testing based on LLVM

This paper describes Mull, an open-source tool for mutation testing base...
research
12/16/2018

Performance Evaluation of Big Data Processing Strategies for Neuroimaging

Neuroimaging datasets are rapidly growing in size as a result of advance...
research
06/07/2016

Big Data Refinement

"Big data" has become a major area of research and associated funding, a...
research
10/31/2022

Mutation Testing Optimisations using the Clang Front-end

Mutation testing is the state-of-the-art technique for assessing the fau...
research
08/23/2018

Measuring Coverage of Prolog Programs Using Mutation Testing

Testing is an important aspect in professional software development, bot...
research
02/21/2020

Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing

Serverless computing is an excellent fit for big data processing because...

Please sign up or login with your details

Forgot password? Click here to reset