Performance Evaluation of an Algorithm-based Asynchronous Checkpoint-Restart Fault Tolerant Application Using Mixed MPI/GPI-2

04/30/2018
by   Adrian Bazaga, et al.
0

One of the hardest challenges of the current Big Data landscape is the lack of ability to process huge volumes of information in an acceptable time. The goal of this work, is to ascertain if it is useful to use typical Big Data tools to solve High Performance Computing problems, by exploring and comparing a distributed computing framework implemented on a commodity cluster architecture: the experiment will depend on the computational time required using tools such as Apache Spark. This will be compared to "equivalent more traditional" approaches such as using a distributed memory model with MPI on a distributed file system such as HDFS (Hadoop Distributed File System) and native C libraries that create an interface to encapsulate this file system functionalities, and using the GPI-2 implementation for the GASPI protocol and it's in-memory checkpointing library to provide an application with Fault Tolerance features. To be more precise, we've chosen the K-means algorithm as experiment, that will be ran on variable size datasets, and then we will compare the computational run time and time resilience of both approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2018

Comparing Spark vs MPI/OpenMP On Word Count MapReduce

Spark provides an in-memory implementation of MapReduce that is widely u...
research
12/16/2018

Performance Evaluation of Big Data Processing Strategies for Neuroimaging

Neuroimaging datasets are rapidly growing in size as a result of advance...
research
03/02/2022

ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms

Fault-tolerant distributed applications require mechanisms to recover da...
research
02/14/2020

Big Data Staging with MPI-IO for Interactive X-ray Science

New techniques in X-ray scattering science experiments produce large dat...
research
11/15/2017

PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

This paper describes PlinyCompute, a system for development of high-perf...
research
04/20/2018

Analyzing astronomical data with Apache Spark

We investigate the performances of Apache Spark, a cluster computing fra...
research
09/17/2023

An Auto-Parallelizer for Distributed Computing in Haskell

One of the main challenges in distributed computing is building interfac...

Please sign up or login with your details

Forgot password? Click here to reset