FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models

04/28/2022
by   Yunzhuo Liu, et al.
0

Training deep learning (DL) models has become a norm. With the emergence of serverless computing and its benefits of true pay-as-you-go pricing and scalability, systems researchers have recently started to provide support for serverless-based training. However, the ability to train deep learning models on serverless platforms is hindered by the inherent limitations of today's serverless infrastructure and the explosive requirement on memory and bandwidth. For example, existing AWS serverless functions have up to 10GB memory and 70MB/s bandwidth while training an AmoebaNet-D model with batch size 64 can require 45GB memory and transfer 900MB data per iteration. The appalling resource mismatch between serverless functions and DL models has stalled the progress on serverless-based training. In this paper, we present FuncPipe, the first pipelined serverless framework that enables fast and low-cost training of DL models. FuncPipe is designed with the key insight that model partitioning can be leveraged to bridge both memory and bandwidth gaps between the capacity of serverless functions and the requirement of DL training. Conceptually simple, we have to answer a number of design questions including how to partition the model, how to configure each serverless function, and how to exploit each function's uplink/downlink bandwidth. We co-optimize model partition and resource allocation with a Mixed-Integer Quadratic Programming formulation and redesign storage-based communication for efficient bandwidth usage. We implement FuncPipe on AWS and AliCloud and show that it achieves up to 77 comparing to state-of-the-art serverless-based frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2020

Efficient Communication Acceleration for Next-GenScale-up Deep Learning Training Platforms

Deep Learning (DL) training platforms are built by interconnecting multi...
research
06/30/2020

Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms

Deep Learning (DL) training platforms are built by interconnecting multi...
research
02/18/2019

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

As the models and the datasets to train deep learning (DL) models scale,...
research
09/01/2023

FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference

Serverless computing (FaaS) has been extensively utilized for deep learn...
research
10/06/2018

Characterizing Deep-Learning I/O Workloads in TensorFlow

The performance of Deep-Learning (DL) computing frameworks rely on the p...
research
08/08/2019

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Recent studies from several hyperscalars pinpoint to embedding layers as...
research
12/13/2020

Comparing the costs of abstraction for DL frameworks

High level abstractions for implementing, training, and testing Deep Lea...

Please sign up or login with your details

Forgot password? Click here to reset