Rightsizing Clusters for Time-Limited Tasks

In conventional public clouds, designing a suitable initial cluster for a given application workload is important in reducing the computational foot-print during run-time. In edge or on-premise clouds, cold-start rightsizing the cluster at the time of installation is crucial in avoiding the recurrent capital expenditure. In both these cases, rightsizing has to balance cost-performance trade-off for a given application with multiple tasks, where each task can demand multiple resources, and the cloud offers nodes with different capacity and cost. Multidimensional bin-packing can address this cold-start rightsizing problem, but assumes that every task is always active. In contrast, real-world tasks (e.g. load bursts, batch and dead-lined tasks with time-limits) may be active only during specific time-periods or may have dynamic load profiles. The cluster cost can be reduced by reusing resources via time sharing and optimal packing. This motivates our generalized problem of cold-start rightsizing for time-limited tasks: given a timeline, time-periods and resource demands for tasks, the objective is to place the tasks on a minimum cost cluster of nodes without violating node capacities at any time instance. We design a baseline two-phase algorithm that performs penalty-based mapping of task to node-type and then, solves each node-type independently. We prove that the algorithm has an approximation ratio of O(D min(m, T)), where D, m and T are the number of resources, node-types and timeslots, respectively. We then present an improved linear programming based mapping strategy, enhanced further with a cross-node-type filling mechanism. Our experiments on synthetic and real-world cluster traces show significant cost reduction by LP-based mapping compared to the baseline, and the filling mechanism improves further to produce solutions within 20

READ FULL TEXT

page 1

page 10

research
04/01/2023

Cost and Reliability Aware Scheduling of Workflows Across Multiple Clouds with Security Constraints

Many real-world scientific workflows can be represented by a Directed Ac...
research
11/09/2021

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

Scientific workflow management systems like Nextflow support large-scale...
research
01/24/2020

Online Resource Procurement and Allocation in a Hybrid Edge-Cloud Computing System

By acquiring cloud-like capacities at the edge of a network, edge comput...
research
09/03/2020

Optimal Load Balanced Demand Distribution under Overload Penalties

Input to the Load Balanced Demand Distribution (LBDD) consists of the fo...
research
02/07/2022

Approximation Algorithms for ROUND-UFP and ROUND-SAP

We study ROUND-UFP and ROUND-SAP, two generalizations of the classical B...
research
02/26/2018

Multi-Commodity Flow with In-Network Processing

Modern networks run "middleboxes" that offer services ranging from netwo...
research
06/22/2021

BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes

Supercomputer FCFS-based scheduling policies result in many transient id...

Please sign up or login with your details

Forgot password? Click here to reset