Low-Level Augmented Bayesian Optimization for Finding the Best Cloud VM

12/28/2017
by   Chin-Jung Hsu, et al.
0

With the advent of big data applications, which tends to have longer execution time, choosing the right cloud VM to run these applications has significant performance as well as economic implications. For example, in our large-scale empirical study of 107 different workloads on three popular big data systems, we found that a wrong choice can lead to a 20 times slowdown or an increase in cost by 10 times. Bayesian optimization is a technique for optimizing expensive (black-box) functions. Previous attempts have only used instance-level information (such as # of cores, memory size) which is not sufficient to represent the search space. In this work, we discover that this may lead to the fragility problem---either incurs high search cost or finds only the sub-optimal solution. The central insight of this paper is to use low-level performance information to augment the process of Bayesian Optimization. Our novel low-level augmented Bayesian Optimization is rarely worse than current practices and often performs much better (in 46 of 107 cases). Further, it significantly reduces the search cost in nearly half of our case studies. Based on this work, we conclude that it is often insufficient to use general-purpose off-the-shelf methods for configuring cloud instances without augmenting those methods with essential systems knowledge such as CPU utilization, working memory size and I/O wait time.

READ FULL TEXT
research
03/04/2018

Scout: An Experienced Guide to Find the Best Cloud Configuration

Finding the right cloud configuration for workloads is an essential step...
research
01/31/2022

SnAKe: Bayesian Optimization with Pathwise Exploration

Bayesian Optimization is a very effective tool for optimizing expensive ...
research
03/22/2020

Cost-aware Bayesian Optimization

Bayesian optimization (BO) is a class of global optimization algorithms,...
research
05/23/2020

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Understanding and predicting the performance of big data applications ru...
research
06/10/2021

A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

Bayesian optimization (BO) is a popular method for optimizing expensive-...
research
06/28/2020

Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads

The use of cloud computational resources has become increasingly importa...
research
11/29/2021

Naive Automated Machine Learning

An essential task of Automated Machine Learning (AutoML) is the problem ...

Please sign up or login with your details

Forgot password? Click here to reset