Bayesian Admission Policies for Cloud Computing Clusters
Cloud computing providers must handle customer workloads that wish to scale their use of resources, such as virtual machines, up and down over time. Currently, this is often done using simple threshold policies to reserve large parts of each cluster. This leads to low average utilization of the cluster. In this paper, we propose more sophisticated Bayesian policies for controlling admission to a cluster and demonstrate that they significantly increase cluster utilization. We first introduce a model for the cluster admission problem and fit its parameters on a data trace from Microsoft Azure. We then design Bayesian cluster admission policies that estimate moments of each workload's distribution of future resource usage. Via simulations we show that, while estimating the first moments of workloads leads to a substantial improvement over the simple threshold policy, also taking the second moments into account yields another improvement in utilization. We then evaluate how much further this can be improved with learned or elicited prior information and how to incentivize users to provide this information.
READ FULL TEXT