Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms

04/19/2020
by   Yujing Ma, et al.
0

The widely-adopted practice is to train deep learning models with specialized hardware accelerators, e.g., GPUs or TPUs, due to their superior performance on linear algebra operations. However, this strategy does not employ effectively the extensive CPU and memory resources – which are used only for preprocessing, data transfer, and scheduling – available by default on the accelerated servers. In this paper, we study training algorithms for deep learning on heterogeneous CPU+GPU architectures. Our two-fold objective – maximize convergence rate and resource utilization simultaneously – makes the problem challenging. In order to allow for a principled exploration of the design space, we first introduce a generic deep learning framework that exploits the difference in computational power and memory hierarchy between CPU and GPU through asynchronous message passing. Based on insights gained through experimentation with the framework, we design two heterogeneous asynchronous stochastic gradient descent (SGD) algorithms. The first algorithm – CPU+GPU Hogbatch – combines small batches on CPU with large batches on GPU in order to maximize the utilization of both resources. However, this generates an unbalanced model update distribution which hinders the statistical convergence. The second algorithm – Adaptive Hogbatch – assigns batches with continuously evolving size based on the relative speed of CPU and GPU. This balances the model updates ratio at the expense of a customizable decrease in utilization. We show that the implementation of these algorithms in the proposed CPU+GPU framework achieves both faster convergence and higher resource utilization than TensorFlow on several real datasets and on two computing architectures – an on-premises server and a cloud instance.

READ FULL TEXT

page 5

page 8

research
02/24/2018

Stochastic Gradient Descent on Highly-Parallel Architectures

There is an increased interest in building data analytics frameworks wit...
research
06/24/2020

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Matrix Factorization (MF) has been widely applied in machine learning an...
research
10/13/2021

Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers

Motivated by extreme multi-label classification applications, we conside...
research
11/20/2016

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels a...
research
02/07/2018

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

In this paper, we present a co-designed petascale high-density GPU clust...
research
12/23/2020

Hardware-accelerated Simulation-based Inference of Stochastic Epidemiology Models for COVID-19

Epidemiology models are central in understanding and controlling large s...
research
04/24/2023

Exploring shared memory architectures for end-to-end gigapixel deep learning

Deep learning has made great strides in medical imaging, enabled by hard...

Please sign up or login with your details

Forgot password? Click here to reset