I Introduction and background
We consider a cloud provider that needs to run multiple software applications on its IT infrastructure. These applications may be distributed and are also called frameworks or workloads in the literature. The cloud provider’s infrastructure consists of multiple servers connected by a network. A server may be a physical machine or virtual machine (e.g., an instance or a container). A server is also referred to as a worker or a slave in some popular resource management solutions. Each framework desires multiple IT resources (CPU, memory, network bandwidth, etc.) for each of its “tasks.” A task is a framework-specific basic unit of work that must be placed within a single server at a given time (e.g., it is useless for a task to be allocated CPU from one server and memory from another). The provider’s challenge then is to determine who should get how many resources from which servers. Our interest is in a private cloud setting wherein notions of fairness have often been used as the basis for this resource allocation problem. In a public setting, on the other hand, the provider’s goal is typically to maximize its profit.
What are meaningful notions of fairness for such multi-resource and multi-server settings? This question has received much attention in the recent past. Proposed fair schedulers include Dominant Resource Fairness (DRF)  extended to multiple servers111DRF was originally defined for a single server in . The multiple-server version, called DRFH in [32, 11], is also commonly called just DRF as done in Apache Mesos  and as we do herein also., Task Share Fairness (TSF) , Per Server Dominant Share Fairness (PS-DSF) [18, 16, 17], among others, e.g., . DRF is resource based, whereas TSF and “containerized” DRF  are task based222Containerized DRF has a “sharing-incentive” property not possessed by DRF, and TSF possesses “strategy-proofness” and “envy-freeness” properties which are not possessed by containerized DRF . Unlike DRF and TSF, PS-DSF is not necessarily Pareto optimal but is “bottleneck” fair. These properties are not addressed herein. In the following, we additionally consider variants of these schedulers that employ current residual (unreserved) capacities of the servers in the fairness criteria (somewhat similar to “best fit” variants ).
Background on existing approaches and their assumptions: Typically static problem formulations are considered under a variety of simplifying assumptions on framework behavior that we discuss below:
It is assumed that frameworks congest all the available servers. That is, it is assumed that there is sufficient work to completely occupy at least one resource in every server.
Frameworks are assumed to have linearly elastic resource demands in the following sense. Each task has a known requirement for the resource type . Therefore, if were the number of tasks of framework placed on server , the framework would consume amount of resource on server .
may take on non-negative real values rather than being restricted to be non-negative integer valued333With
integer valued, such problems belong to the class of combinatorial-optimization multidimensional bin-packing problems,e.g., [4, 6, 8], which are NP-hard. They have been extensively studied, including relaxations to simplified problems that yield approximately optimal solutions, e.g.
, by Integer Linear Programs solved by iterated/online means..
Note that in some settings, a goal is to minimize the number of servers to accommodate workloads with finite needs, again as in multidimensional bin-packing problems [4, 6, 8]. Such problem formulations are typically motivated by the desire to economize on energy. However, frequently cycling power to (booting up) servers may result in software errors and there are energy spikes associated with boot-up resulting in increased electricity costs . We are not interested in such settings herein.
Typically in existing papers, max-min fairness with respect to a proposed fairness criteria is specified assuming the aforementioned congested regime under the following (linear) capacity constraints:
where is the amount of available resource in server for the instances under consideration444Note that if then workload type is not assigned to server .. Additionally, there may be placement constraints, , whereby . Max-min fair allocation may be expressed as the solution of a constrained centralized optimization problem. Alternatively, max-min fairness with respect to the proposed fairness criteria may be approximated by a greedy, iterative “progressive filling” allocation. The latter approach is often preferred because of the benefits this offers for online implementations. Moreover, progressive filling arguments can be used to establish other potentially desirable fairness properties of schedulers defined for private clouds555Again, Pareto optimality, sharing incentive, strategy proofness, bottleneck fairness, and envy freeness  - properties that are not addressed herein..
Instead of max-min fairness, the cloud may admit and place instances so as to maximize, e.g., total weighted tasking objective,
subject to (1), where is the priority of application framework . In this paper, we relate this task efficiency objective to “proportional” fairness.
In Sections II and III, for generic fairness criteria, we generalize to multiple resources the static optimization problems of e.g., [2, 23, 15] whose solutions correspond to max-min fairness and proportional fairness, respectively. In Section IV, a simple, greedy, iterative method intended to achieve max-min fairness called progressive filling is described. Progressive filling is important for online implementation. In Section V, the performance evaluation objectives of the following two sections are discussed: task efficiency (related to proportional fairness) and overall execution time. In Section VI, illustrative numerical examples are used to compare the task efficiencies of different schedulers, including variants using residual/unreserved server resource capacities specified herein. In , we give the results of an online experimental study using our implementations of different schedulers on Spark and Mesos [22, 27] for benchmark workloads considering an execution-time performance metric. The paper concludes with a summary in Section VII and a brief discussion of future work (regarding scheduling in public clouds).
Our mathematical notation is given in Table I.
|resource type index|
|index of the dominant resource|
|weight/priority of user|
|the number of tasks or workload intensity|
|per-task resource requirement|
|the total available resource amounts|
|server preference indicator|
|the set of users that can run on server|
|fully booked resources of server under|
Ii Max-Min Fairness
for scalars and priorities (specific examples of fairness criteria are given below). In addition, consider the service-preference sets
Relaxing the allocations to be real valued, consider strictly concave and increasing with , and define the optimization problem
such that (here restating (1))
Regarding fully booked resources in server under allocations , also let
For the following definition, assume that .
A feasible allocation satisfying (6) is said to be -Max-Min Fair (MMF) if:
implies that .
Note that if instead in this definition, then can be reduced and increased to reduce . Also, if is -MMF and for some server then . Quantization (containerization) issues associated with workload resource demands are considered in .
where . That is, under multi-server DRF,
The server-specific PS-DSF criterion can be written as
where is such that
So, under PS-DSF,
Iii Proportional Fairness
For weighted proportional fairness, consider the objective
i.e., without dividing by in the argument of . For parameter specifically take
i.e., , again see . Obviously, in the case of (), whether the factor is in the argument of is immaterial.
Proof: See Appendix B.
From the proof, is unique though may not be. We can normalize and when write (16) as
A possible definition of the efficiency of a feasible allocation is (2) corresponding to ,
i.e., the weighted total number of tasks scheduled. So, the optimization of Proposition 2 with gives an allocation that is related to a task efficient allocation. Clearly, satisfying (16) for all other allocations with does not necessarily maximize (17using the ratio of the means , see e.g. p. 351 of  or (11) of  . For simplicity in the following, we use (17) instead of (16).
Note that the priority of framework could factor its resource footprint . Alternatively, the resource footprints of the frameworks can be explicitly incorporated into the main optimization objective via a fairness criterion. The proof of the following corollary is just as that of Proposition 2. Recall that the generic fairness criterion (3) is a linear combination of .
A solution of the optimization problem
is uniquely -proportional fair, i.e., for any other feasible ,
Again, optimal would be unique but may not be.
Iv Progressive filling to approximate max-min fair allocation
In the following evaluation studies, resources are incrementally (taskwise) allocated to frameworks with the intention to approximate max-min fairness (with respect to the fairness criterion used). The approach is greedy: simply, the framework with smallest fairness criterion (or ), based on existing allocations , will be allocated a resource increment for small666Typically when allocations are measured in “tasks”. . If a framework’s resource demands cannot be accommodated with available resources, the framework with the next smallest fairness criterion will be allocated by this progressive filling approach [2, 12]. The choice of server from which to allocate can be random, e.g., as for the Mesos default task-level progressive filling for DRF, see . Alternatively, the framework and server can be jointly chosen (e.g., using PS-DSF).
Note how progressive filling can operate in the presence of churn in the set of active frameworks, where in asynchronous fashion, new frameworks could be initiated or a framework would release all of its resources once its computations are completed, see . In the following we assess the efficiencies of max-min fair approximations by progressive filling according to different schedulers.
Because there is no resource revocation, a problem occurs when, say, servers are booked so that there are insufficient spare resources to allocate for a task of a just initiated framework (particularly a higher priority one). Thus, new frameworks may need to wait for sufficient resources to be released (by the termination of other frameworks). Alternatively, all existing frameworks could be reallocated whenever any new framework initiates or any existing framework terminates. Though within a server such reallocations are commonplace in a private setting, the effect of such “live” reallocations may be that tasks need to be terminated and reassigned to other servers (or live migrated). The following illustrative numerical examples allocate a single initial framework batch (without framework churn). In the following emulation study for equal priority workloads and framework churn, we work with the default progressive-filling mechanism in Mesos wherein existing frameworks are not adjusted upon framework churn.
V Evaluation objectives: Task efficiency of max-min fair allocations
In the following, though we aim for max-min fairness with progressive filling, we are also interested in the proportional fairness achieved. We compare the efficiency (17) of the allocations achieved by progressive filling for examples with heterogeneous workloads and servers. In the performance evaluation of our Mesos implementations, efficiency is defined by overall execution time.
Though PS-DSF allocations achieved by progressive filling may not be Pareto optimal, we show that they are more efficient, even in some of our Mesos experiments where servers are (at least initially) selected at random.
In the following, for brevity, we consider only cases with frameworks of equal priority () and without server-preference constraints (i.e., ).
Vi Illustrative numerical study of fair scheduling by progressive filling
In this section, we consider the following typical example of our numerical study with two heterogeneous distributed application frameworks () having resource demands per unit workload:
and two heterogeneous servers () having two different resources with capacities:
For DRF and TSF, the servers are chosen in round-robin fashion, where the server order is randomly permuted in each round; DRF under such randomized round-robin (RRR) server selection is the default Mesos scheduler, cf. next section. One can also formulate PS-DSF under RRR wherein RRR selects the server and the PS-DSF criterion only selects the framework for that server. Frameworks are chosen by progressive filling with integer-valued tasking (), i.e., whole tasks are scheduled.
Numerical results for scheduled workloads for this illustrative example are given in Tables II & III, and unused resources are given in Tables IV and V. 200 trials were performed for DRF, TSF and PS-DSF under RRR server selection, so using Table III
we can obtain confidence intervals for the averaged quantities given in TableII for schedulers under RRR. For example, the 95% confidence interval for task allocation of the first framework on the second server (i.e., ) under TSF is
Note how PS-DSF’s performance under RRR is comparable to when frameworks and servers are jointly selected 
, and with low variance in allocations. We also found that RRR-rPS-DSF performed just as rPS-DSF over 200 trials.
|DRF [12, 32]||6.55||4.69||4.69||6.55||22.48|
|DRF [12, 32]||2.31||0.46||0.46||2.31|
Sample standard deviation of allocationsfor different schedulers under RRR server selection with. Averaged values over 200 trials reported.
|DRF [12, 32]||11.09||0||0||11.09|
We found task efficiencies improve using residual forms of the fairness criterion. For example, the residual PS-DSF (rPS-DSF) criterion is
That is, this criterion makes scheduling decisions by progressive filling using current residual (unreserved) capacities based on the current allocations . From Table II, we see the improvement is modest for the case of PS-DSF.
Improvements are also obtained by best-fit server selection. For example, best-fit DRF (BF-DRF) first selects framework by DRF and then selects the server whose residual capacity most closely matches their resource demands .
Vii Summary and Future Work
For a private-cloud setting, we considered scheduling a group of heterogeneous, distributed frameworks to a group of heterogeneous servers. We extended two general results on max-min fairness and proportional fairness to this case for a static problem under generic scheduling criteria. Subsequently, we assessed the efficiency of approximate max-min fair allocations by progressive filling according to different fairness criteria. Illustrative examples in heterogeneous settings show that max-min fair PS-DSF and rPS-DSF scheduling, are superior to DRF in terms of task efficiency performance (a metric related to proportional fairness) and that the efficiency of these “server specific” schedulers did not significantly suffer from the use of randomized round-robin server selection. Task efficiency was also improved when either the “best fit” approach to selecting servers was used or the fairness criteria was modified to use current residual/unreserved resource capacities. We also open-source implemented oblivious (“coarse grained”) and workload-characterized (specified resource demands ) online prototypes of these schedulers on Mesos [22, 27], with the Mesos default/baseline being oblivious DRF. Using two different Spark workloads and heterogeneous servers, we showed that the schedulers were similarly ranked using the total execution time as the performance measure. Moreover, execution times could be shortened with workload characterization.
In future work, we will consider scheduling (admission control and placement) problems in a public cloud setting. To this end, note that similar objectives to those considered herein for a private-cloud setting, particularly (2), may be reinterpreted as overall revenue based on bids for virtual machines or containers with fixed resource allocations . Also, as profit margins diminish in a maturing marketplace, one expects that public clouds will need to operate with greater resource efficiency. Note that notions of fair scheduling and desirable properties of schedulers as defined in, e.g., [12, 11, 30] may not be relevant to the public-cloud setting, where the expectation is that different customers/frameworks simply “get what they pay for.” Moreover, in a public cloud setting, what the customers do with their virtual machines/containers is arguably not the concern of the cloud operator so long as the customer complies with service level agreements. But, e.g., notions of strategy proofness are important considerations in the design of auction  and spot-pricing mechanisms (where under spot price mechanisms, virtual machines or containers may be revoked).
-  T. F. Abdelzaher, K. G. Shin, and N. Bhatti. Performance guarantees for web server end-systems: A control-theoretical approach. IEEE Trans. Parallel Distrib. Syst., 13(1):80–96, 2002.
-  D. Bertsekas and R. Gallager. Data Networks, 2nd Ed. Prentice Hall, 1992.
-  A. Chandra, W. Gong, and P. Shenoy. Dynamic resource allocation for shared data centers using online measurements. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’03, 2003.
-  C. Chekuri and S. Khanna. On multi-dimensional packing problems. SIAM Journal of Computing, 33(4):837–851, 2004.
-  M. Chowdhury, Z. Liu, A. Ghodsi, and I. Stoica. HUG: Multi-resource fairness for correlated and elastic demands. In Proc. USENIX NSDI, March 2016.
-  H. Christensen, A. Khan, S. Pokutta, and P. Tetali. Multidimensional Bin Packing and Other Related Problems: A Survey. https://people.math.gatech.edu/tetali/PUBLIS/CKPT.pdf, 2016.
-  I. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J. S. Chase. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI’04, 2004.
-  M. Cohen, V.Mirrokni, P. Keller, and M. Zadimoghaddam. Overcommitment in Cloud Services Bin packing with Chance Constraints. In Proc. ACM SIGMETRICS, Urbana-Campaign, IL, June 2017.
-  R. P. Doyle, J. S. Chase, O. M. Asad, W. Jin, and A. M. Vahdat. Model-based resource provisioning in a web service utility. In Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems - Volume 4, USITS’03, 2003.
-  Duke utility bill tariff, 2012. http://www.considerthecarolinas.com/pdfs/scscheduleopt.pdf.
-  E. Friedman, A. Ghodsi, and C.-A. Psomas. Strategyproof allocation of discrete jobs on multiple machines. In Proc. ACM Conf. on Economics and Computation, 2014.
-  A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In Proc. USENIX NSDI, 2011.
-  B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In Proc. USENIX NSDI, 2011.
-  C. Joe-Wong, S. Sen, T. Lan, and M. Chiang. Multi-resource allocation: Fairness-efficiency tradeoffs in a unifying framework. IEEE/ACM Trans. Networking, 21(6), Dec. 2013.
-  J. Khamse-Ashari, G. Kesidis, I. Lambadaris, B. Urgaonkar, and Y. Zhao. Constrained Max-Min Fair Scheduling of Variable-Length Packet-Flows to Multiple Servers. In Proc. IEEE GLOBECOM, Washington, DC, Dec. 2016.
-  J. Khamse-Ashari, I. Lambadaris, G. Kesidis, B. Urgaonkar, and Y. Zhao. An Efficient and Fair Multi-Resource Allocation Mechanism for Heterogeneous Servers. http://arxiv.org/abs/1712.10114, Dec. 2017.
-  J. Khamse-Ashari, I. Lambadaris, G. Kesidis, B. Urgaonkar, and Y. Zhao. An Efficient and Fair Multi-Resource Allocation Mechanism for Heterogeneous Servers. IEEE Trans. Parallel and Distributed Systems (TPDS), May 2018.
-  J. Khamse-Ashari, I. Lambadaris, G. Kesidis, B. Urgaonkar, and Y. Zhao. Per-Server Dominant-Share Fairness (PS-DSF): A Multi-Resource Fair Allocation Mechanism for Heterogeneous Servers. https://arxiv.org/abs/1611.00404, Nov. 1, 2016.
-  R. Levy, J. Nagarajarao, G. Pacifici, M. Spreitzer, A. Tantawi, and A. Youssef. Performance management for cluster based web services. In G. Goldszmidt and J. Schönwälder, editors, Integrated Network Management VIII: Managing It All, pages 247–261. Springer US, 2003.
-  C. Lu, T. F. Abdelzaher, J. A. Stankovic, and S. H. Son. A feedback control approach for guaranteeing relative delays in web servers. In Proceedings of the Seventh Real-Time Technology and Applications Symposium, RTAS ’01, 2001.
-  D. A. Menasce. Web server software architectures. IEEE Internet Computing, 7(6):78–81, 2003.
-  Mesos multi-scheduler. https://github.com/PSU-Cloud/mesos-ps/pull/1/files.
-  J. Mo and J. Walrand. Fair end-to-end window-based congestion control. IEEE/ACM Trans. Networking, Vol. 8, No. 5:pp. 556–567, 2000.
-  M. N. Bennani and D. A. Menasce. Resource allocation for autonomic data centers using analytic performance models. In Proceedings of the Second International Conference on Automatic Computing, ICAC ’05. IEEE Computer Society, 2005.
-  H. Seltman. Approximation of mean and variance of a ratio. http://www.stat.cmu.edu/ hseltman/files/ratio.pdf.
-  Y. Shan, A. Jain, G. Kesidis, B. Urgaonkar, J. Khamse-Ashari, and I. Lambadaris. Online Scheduling of Spark Workloads with Mesos using Different Fair Allocation Algorithms. https://arxiv.org/abs/1803.00922, March 2, 2018.
-  Spark with HeMT. https://github.com/PSU-Cloud/spark-hemt/pull/2/files.
-  A. Stuart and K. Ord. Kendall’s Advanced Theory of Statistics. Arnold, London, 6th edition, 1998.
-  Vickrey-Clarke-Groves auction. https://en.wikipedia.org/wiki/Vickrey-Clarke-Groves_auction.
-  W. Wang, B. Li, B. Liang, and J. Li. Towards multi-resource fair allocation with placement constraints. In Proc. ACM SIGMETRICS, Antibes, France, 2015.
-  W. Wang, B. Li, B. Liang, and J. Li. Multi-resource fair sharing for datacenter jobs with placement constraints. In Proc. Supercomputing, Salt Lake City, Utah, 2016.
-  W. Wang, B. Liang, and B. Li. Multi-resource fair allocation in heterogeneous cloud computing systems. IEEE Transactions on Parallel and Distributed Systems, 26(10):2822–2835, Oct. 2015.
-  W. Xu, P. Bodik, and D. Patterson. A flexible architecture for statistical learning and data mining from system log streams. In Proceedings of Workshop on Temporal Data Mining: Algorithms, Theory and Applications at the Fourth IEEE International Conference on Data Mining, Brighton, UK, 2004.
-  K.-K. Yap, T.-Y. Huang, Y. Yiakoumis, S. Chinchali, N. McKeown, and S. Katti. Scheduling packets over multiple interfaces while respecting user preferences. In Proc. ACM CoNEXT, Dec. 2013.
Appendix A: Proof of Proposition 1
Define the Lagrangian to be maximized over and over Lagrange multipliers :
The first-order optimality condition,
and strictly increasing imply
So, , s.t. . Thus, complementary slackness is
i.e., in every server , one resource (which may depend on ) is fully booked. So, the set of fully booked resources in server under allocations can be characterized by . Now by (20) and assumed strict concavity of , uniquely
Now consider two frameworks and and server such that and . So, complementary slackness
Appendix B: Proof of Proposition 2
The Lagrangian here is
where, again, the Lagrange multipliers . A first-order optimality condition is