A Predictive Autoscaler for Elastic Batch Jobs

10/10/2020
by   Peng Gao, et al.
0

Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service. Like the processing of other time series data, these jobs possess a variety of characteristics such as trend, burst, and seasonality. Cloud providers offer short-term instances to achieve scalability, stability, and cost-efficiency. Given the time lag caused by joining into the cluster and initialization, crowded workloads may lead to a violation in the scheduling system. Based on the assumption that there are infinite resources and ideal placements available for users to require in the cloud environment, we propose a predictive autoscaler to provide an elastic interface for the customers and overprovision instances based on the trained regression model. We contribute to a method to embed heterogeneous resource requirements in continuous space into discrete resource buckets and an autoscaler to do predictive expand plans on the time series of resource bucket counts. Our experimental evaluation of the production resources usage data validates the solution and the results show that the predictive autoscaler relieves the burden of making scaling plans, avoids long launching time at lower cost and outperforms other prediction methods with fine-tuned settings.

READ FULL TEXT

page 3

page 10

research
09/08/2021

An Optimal Resource Allocator of Elastic Training for Deep Learning Jobs on Cloud

Cloud training platforms, such as Amazon Web Services and Huawei Cloud p...
research
06/24/2020

Effective Elastic Scaling of Deep Learning Workloads

The increased use of deep learning (DL) in academia, government and indu...
research
10/20/2019

RLScheduler: Learn to Schedule HPC Batch Jobs Using Deep Reinforcement Learning

We present RLScheduler, a deep reinforcement learning based job schedule...
research
05/23/2022

An Elastic Ephemeral Datastore using Cheap, Transient Cloud Resources

Spot instances are virtual machines offered at 60-90 reclaimed at any ti...
research
09/14/2022

Cost-efficient Auto-scaling of Container-based Elastic Processes

In business process landscapes, a common challenge is to provide the nec...
research
04/07/2022

Elastic Model Aggregation with Parameter Service

Model aggregation, the process that updates model parameters, is an impo...
research
03/12/2021

A Risk-taking Broker Model to Optimise User Requests placement on On-demand and Contract VMs

Cloud providers offer end-users various pricing schemes to allow them to...

Please sign up or login with your details

Forgot password? Click here to reset