Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

05/10/2018
by   Minjie Wang, et al.
0

Deep learning systems have become vital tools across many fields, but the increasing model sizes mean that training must be accelerated to maintain such systems' utility. Current systems like Tensorflow and MXNet focus on one specific parallelization strategy, data parallelism, which requires large training batch sizes in order to scale. We cast the problem of finding the best parallelization strategy as the problem of finding the best tiling to partition tensors with the least overall communication. We propose an algorithm that can find the optimal tiling. Our resulting parallelization solution is a hybrid of data parallelism and model parallelism. We build the SoyBean system that performs automatic parallelization. SoyBean automatically transforms a serial dataflow graph captured by an existing deep learning system frontend into a parallel dataflow graph based on the optimal tiling it has found. Our evaluations show that SoyBean is 1.5x-4x faster than pure data parallelism for AlexNet and VGG. We present this automatic tiling in a new system, SoyBean, that can act as a backend for Tensorflow, MXNet, and others.

READ FULL TEXT
research
07/14/2018

Beyond Data and Model Parallelism for Deep Neural Networks

The computational requirements for training deep neural networks (DNNs) ...
research
07/30/2019

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Deploying deep learning (DL) models across multiple compute devices to t...
research
04/23/2019

A Flexible Framework for Parallel Multi-Dimensional DFTs

Multi-dimensional discrete Fourier transforms (DFT) are typically decomp...
research
12/31/2020

An Order-aware Dataflow Model for Extracting Shell Script Parallelism

We present a dataflow model for extracting data parallelism latent in Un...
research
11/05/2018

Mesh-TensorFlow: Deep Learning for Supercomputers

Batch-splitting (data-parallelism) is the dominant distributed Deep Neur...
research
08/21/2019

Dynamic Scheduling of MPI-based Distributed Deep Learning Training Jobs

There is a general trend towards solving problems suited to deep learnin...
research
04/22/2021

An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering

As the data size in Machine Learning fields grows exponentially, it is i...

Please sign up or login with your details

Forgot password? Click here to reset