An MPI-Based Python Framework for Distributed Training with Keras

12/16/2017
by   Dustin Anderson, et al.
0

We present a lightweight Python framework for distributed training of neural networks on multiple GPUs or CPUs. The framework is built on the popular Keras machine learning library. The Message Passing Interface (MPI) protocol is used to coordinate the training process, and the system is well suited for job submission at supercomputing sites. We detail the software's features, describe its use, and demonstrate its performance on systems of varying sizes on a benchmark problem drawn from high-energy physics research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2021

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Python has become a dominant programming language for emerging areas lik...
research
12/28/2022

Hybrid Cloud and HPC Approach to High-Performance Dataframes

Data pre-processing is a fundamental component in any data-driven applic...
research
01/21/2021

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Dask is a popular parallel and distributed computing framework, which ri...
research
02/25/2021

The PetscSF Scalable Communication Layer

PetscSF, the communication component of the Portable, Extensible Toolkit...
research
11/03/2022

MPI-based Evaluation of Coordinator Election Algorithms

In this paper, we detail how two types of distributed coordinator electi...
research
10/25/2018

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

TensorFlow has been the most widely adopted Machine/Deep Learning framew...
research
03/17/2021

PythonFOAM: In-situ data analyses with OpenFOAM and Python

In this article, we outline the development of a general-purpose Python-...

Please sign up or login with your details

Forgot password? Click here to reset