ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization Of Ephemeral Cloud Resources

09/23/2020
by   Mohamed Handaoui, et al.
0

Cloud data center capacities are over-provisioned to handle demand peaks and hardware failures which leads to low resources' utilization. One way to improve resource utilization and thus reduce the total cost of ownership is to offer unused resources (referred to as ephemeral resources) at a lower price. However, reselling resources needs to meet the expectations of its customers in terms of Quality of Service. The goal is so to maximize the amount of reclaimed resources while avoiding SLA penalties. To achieve that, cloud providers have to estimate their future utilization to provide availability guarantees. The prediction should consider a safety margin for resources to react to unpredictable workloads. The challenge is to find the safety margin that provides the best trade-off between the amount of resources to reclaim and the risk of SLA violations. Most state-of-the-art solutions consider a fixed safety margin for all types of metrics (e.g., CPU, RAM). However, a unique fixed margin does not consider various workloads variations over time which may lead to SLA violations or/and poor utilization. In order to tackle these challenges, we propose ReLeaSER, a Reinforcement Learning strategy for optimizing the ephemeral resources' utilization in the cloud. ReLeaSER dynamically tunes the safety margin at the host-level for each resource metric. The strategy learns from past prediction errors (that caused SLA violations). Our solution reduces significantly the SLA violation penalties on average by 2.7x and up to 3.4x. It also improves considerably the CPs' potential savings by 27.6 up to 43.6

READ FULL TEXT

page 1

page 7

research
04/28/2022

RISCLESS: A Reinforcement Learning Strategy to Exploit Unused Cloud Resources

One of the main objectives of Cloud Providers (CP) is to guarantee the S...
research
11/01/2018

Modeling Conceptual Characteristics of Virtual Machines for CPU Utilization Prediction

Cloud services have grown rapidly in recent years, which provide high fl...
research
11/21/2022

Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

Oversubscription is a common practice for improving cloud resource utili...
research
03/19/2018

Cloud Provider Capacity Augmentation Through Automated Resource Bartering

Growing interest in Cloud Computing places a heavy workload on cloud pro...
research
08/28/2021

Harvesting Idle Resources in Serverless Computing via Reinforcement Learning

Serverless computing has become a new cloud computing paradigm that prom...
research
12/02/2019

MORPHOSYS: Efficient Colocation of QoS-Constrained Workloads in the Cloud

In hosting environments such as IaaS clouds, desirable application perfo...
research
09/18/2020

Akita: A CPU scheduler for virtualized Clouds

Clouds inherit CPU scheduling policies of operating systems. These polic...

Please sign up or login with your details

Forgot password? Click here to reset