MLLess: Achieving Cost Efficiency in Serverless Machine Learning Training

06/12/2022
by   Pablo Gimeno Sarroca, et al.
0

Function-as-a-Service (FaaS) has raised a growing interest in how to "tame" serverless computing to enable domain-specific use cases such as data-intensive applications and machine learning (ML), to name a few. Recently, several systems have been implemented for training ML models. Certainly, these research articles are significant steps in the correct direction. However, they do not completely answer the nagging question of when serverless ML training can be more cost-effective compared to traditional "serverful" computing. To help in this endeavor, we propose MLLess, a FaaS-based ML training prototype built atop IBM Cloud Functions. To boost cost-efficiency, MLLess implements two innovative optimizations tailored to the traits of serverless computing: on one hand, a significance filter, to make indirect communication more effective, and on the other hand, a scale-in auto-tuner, to reduce cost by benefiting from the FaaS sub-second billing model (often per 100ms). Our results certify that MLLess can be 15X faster than serverful ML systems at a lower cost for sparse ML models that exhibit fast convergence such as sparse logistic regression and matrix factorization. Furthermore, our results show that MLLess can easily scale out to increasingly large fleets of serverless workers.

READ FULL TEXT

page 1

page 5

page 7

page 9

page 12

page 16

page 21

page 22

research
05/17/2021

Towards Demystifying Serverless Machine Learning Training

The appeal of serverless (FaaS) has triggered a growing interest on how ...
research
10/04/2021

TACC: A Full-stack Cloud Computing Infrastructure for Machine Learning Tasks

In Machine Learning (ML) system research, efficient resource scheduling ...
research
03/20/2018

MLtuner: System Support for Automatic Machine Learning Tuning

MLtuner automatically tunes settings for training tunables (such as the ...
research
12/07/2020

SpotTune: Leveraging Transient Resources for Cost-efficient Hyper-parameter Tuning in the Public Cloud

Hyper-parameter tuning (HPT) is crucial for many machine learning (ML) a...
research
10/18/2019

Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data

The usability and practicality of any machine learning (ML) applications...
research
04/04/2023

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

In response to innovations in machine learning (ML) models, production w...
research
06/04/2022

Exploring the Potential of Feature Density in Estimating Machine Learning Classifier Performance with Application to Cyberbullying Detection

In this research. we analyze the potential of Feature Density (HD) as a ...

Please sign up or login with your details

Forgot password? Click here to reset