I Introduction and background
We consider a cloud provider that needs to run multiple software applications on its IT infrastructure. These applications may be distributed and are also called frameworks or workloads in the literature. The cloud provider’s infrastructure consists of multiple servers connected by a network. A server may be a physical machine or virtual machine (e.g., an instance or a container). A server is also referred to as a worker or a slave in some popular resource management solutions. Each framework desires multiple IT resources (CPU, memory, network bandwidth, etc.) for each of its “tasks.” A task is a frameworkspecific basic unit of work that must be placed within a single server at a given time (e.g., it is useless for a task to be allocated CPU from one server and memory from another). The provider’s challenge then is to determine who should get how many resources from which servers. Our interest is in a private cloud setting wherein notions of fairness have often been used as the basis for this resource allocation problem. In a public setting, on the other hand, the provider’s goal is typically to maximize its profit.
What are meaningful notions of fairness for such multiresource and multiserver settings? This question has received much attention in the recent past. Proposed fair schedulers include Dominant Resource Fairness (DRF) [12] extended to multiple servers^{1}^{1}1DRF was originally defined for a single server in [12]. The multipleserver version, called DRFH in [32, 11], is also commonly called just DRF as done in Apache Mesos [13] and as we do herein also., Task Share Fairness (TSF) [31], Per Server Dominant Share Fairness (PSDSF) [18, 16, 17], among others, e.g., [5]. DRF is resource based, whereas TSF and “containerized” DRF [11] are task based^{2}^{2}2Containerized DRF has a “sharingincentive” property not possessed by DRF, and TSF possesses “strategyproofness” and “envyfreeness” properties which are not possessed by containerized DRF [31]. Unlike DRF and TSF, PSDSF is not necessarily Pareto optimal but is “bottleneck” fair. These properties are not addressed herein. In the following, we additionally consider variants of these schedulers that employ current residual (unreserved) capacities of the servers in the fairness criteria (somewhat similar to “best fit” variants [32]).
Background on existing approaches and their assumptions: Typically static problem formulations are considered under a variety of simplifying assumptions on framework behavior that we discuss below:

It is assumed that frameworks congest all the available servers. That is, it is assumed that there is sufficient work to completely occupy at least one resource in every server.

Frameworks are assumed to have linearly elastic resource demands in the following sense. Each task has a known requirement for the resource type . Therefore, if were the number of tasks of framework placed on server , the framework would consume amount of resource on server .

may take on nonnegative real values rather than being restricted to be nonnegative integer valued^{3}^{3}3With
integer valued, such problems belong to the class of combinatorialoptimization multidimensional binpacking problems,
e.g., [4, 6, 8], which are NPhard. They have been extensively studied, including relaxations to simplified problems that yield approximately optimal solutions, e.g., by Integer Linear Programs solved by iterated/online means.
.
Note that in some settings, a goal is to minimize the number of servers to accommodate workloads with finite needs, again as in multidimensional binpacking problems [4, 6, 8]. Such problem formulations are typically motivated by the desire to economize on energy. However, frequently cycling power to (booting up) servers may result in software errors and there are energy spikes associated with bootup resulting in increased electricity costs [10]. We are not interested in such settings herein.
Typically in existing papers, maxmin fairness with respect to a proposed fairness criteria is specified assuming the aforementioned congested regime under the following (linear) capacity constraints:
(1) 
where is the amount of available resource in server for the instances under consideration^{4}^{4}4Note that if then workload type is not assigned to server .. Additionally, there may be placement constraints, , whereby . Maxmin fair allocation may be expressed as the solution of a constrained centralized optimization problem. Alternatively, maxmin fairness with respect to the proposed fairness criteria may be approximated by a greedy, iterative “progressive filling” allocation. The latter approach is often preferred because of the benefits this offers for online implementations. Moreover, progressive filling arguments can be used to establish other potentially desirable fairness properties of schedulers defined for private clouds^{5}^{5}5Again, Pareto optimality, sharing incentive, strategy proofness, bottleneck fairness, and envy freeness [12]  properties that are not addressed herein..
Instead of maxmin fairness, the cloud may admit and place instances so as to maximize, e.g., total weighted tasking objective,
(2) 
subject to (1), where is the priority of application framework . In this paper, we relate this task efficiency objective to “proportional” fairness.
In Sections II and III, for generic fairness criteria, we generalize to multiple resources the static optimization problems of e.g., [2, 23, 15] whose solutions correspond to maxmin fairness and proportional fairness, respectively. In Section IV, a simple, greedy, iterative method intended to achieve maxmin fairness called progressive filling is described. Progressive filling is important for online implementation. In Section V, the performance evaluation objectives of the following two sections are discussed: task efficiency (related to proportional fairness) and overall execution time. In Section VI, illustrative numerical examples are used to compare the task efficiencies of different schedulers, including variants using residual/unreserved server resource capacities specified herein. In [26], we give the results of an online experimental study using our implementations of different schedulers on Spark and Mesos [22, 27] for benchmark workloads considering an executiontime performance metric. The paper concludes with a summary in Section VII and a brief discussion of future work (regarding scheduling in public clouds).
Our mathematical notation is given in Table I.
Symbol  Definition 

server index  
user/framework index  
resource type index  
index of the dominant resource  
weight/priority of user  
the number of tasks or workload intensity  
pertask resource requirement  
the total available resource amounts  
server preference indicator  
the set of users that can run on server  
fully booked resources of server under  
allocationfairness scores 
Ii MaxMin Fairness
To generalize previous results on maxmin fairness (e.g., [2, 12, 15]) to multiple resource types on multiple servers, consider the following generalpurpose fairness criterion for framework ,
(3) 
for scalars and priorities (specific examples of fairness criteria are given below). In addition, consider the servicepreference sets
where  (4) 
Relaxing the allocations to be real valued, consider strictly concave and increasing with , and define the optimization problem
(5) 
such that (here restating (1))
and  (6) 
where
(7) 
Note that the objective is continuous and strictly concave and the domain given by (6) (equivalently (1)) is compact. So, simply by Weierstrass’s Extreme Value Theorem, there exists a unique maximum.
Regarding fully booked resources in server under allocations , also let
For the following definition, assume that .
Definition 1
Note that if instead in this definition, then can be reduced and increased to reduce . Also, if is MMF and for some server then . Quantization (containerization) issues associated with workload resource demands are considered in [11].
Under multiserver DRF [12, 32], frameworks are selected using criterion
(8) 
where . That is, under multiserver DRF,
(9) 
The serverspecific PSDSF criterion can be written as
(10) 
where is such that
(11) 
Maxmin fairness according to the joint frameworkserver criterion is considered in [18, 16, 17]. Here define
(12)  
So, under PSDSF,
(13) 
Proposition 1
Iii Proportional Fairness
For weighted proportional fairness, consider the objective
(15) 
i.e., without dividing by in the argument of [23]. For parameter specifically take
i.e., , again see [23]. Obviously, in the case of (), whether the factor is in the argument of is immaterial.
The following generalizes Lemma 2 of [23] on Proportional Fairness. See also the proportionalfairness/efficiency tradeoff framework of [14] for a single server.
Proposition 2
Proof: See Appendix B.
From the proof, is unique though may not be. We can normalize and when write (16) as
A possible definition of the efficiency of a feasible allocation is (2) corresponding to ,
(17) 
i.e., the weighted total number of tasks scheduled. So, the optimization of Proposition 2 with gives an allocation that is related to a task efficient allocation. Clearly, satisfying (16) for all other allocations with does not necessarily maximize (17
). This issue is analogous to estimating the mean of the ratio of positive random variables
using the ratio of the means , see e.g. p. 351 of [28] or (11) of [25] . For simplicity in the following, we use (17) instead of (16).Note that the priority of framework could factor its resource footprint . Alternatively, the resource footprints of the frameworks can be explicitly incorporated into the main optimization objective via a fairness criterion. The proof of the following corollary is just as that of Proposition 2. Recall that the generic fairness criterion (3) is a linear combination of .
Corollary 1
A solution of the optimization problem
s.t. 
is uniquely proportional fair, i.e., for any other feasible ,
Again, optimal would be unique but may not be.
Iv Progressive filling to approximate maxmin fair allocation
In the following evaluation studies, resources are incrementally (taskwise) allocated to frameworks with the intention to approximate maxmin fairness (with respect to the fairness criterion used). The approach is greedy: simply, the framework with smallest fairness criterion (or ), based on existing allocations , will be allocated a resource increment for small^{6}^{6}6Typically when allocations are measured in “tasks”. . If a framework’s resource demands cannot be accommodated with available resources, the framework with the next smallest fairness criterion will be allocated by this progressive filling approach [2, 12]. The choice of server from which to allocate can be random, e.g., as for the Mesos default tasklevel progressive filling for DRF, see [26]. Alternatively, the framework and server can be jointly chosen (e.g., using PSDSF).
Note how progressive filling can operate in the presence of churn in the set of active frameworks, where in asynchronous fashion, new frameworks could be initiated or a framework would release all of its resources once its computations are completed, see [26]. In the following we assess the efficiencies of maxmin fair approximations by progressive filling according to different schedulers.
Because there is no resource revocation, a problem occurs when, say, servers are booked so that there are insufficient spare resources to allocate for a task of a just initiated framework (particularly a higher priority one). Thus, new frameworks may need to wait for sufficient resources to be released (by the termination of other frameworks). Alternatively, all existing frameworks could be reallocated whenever any new framework initiates or any existing framework terminates. Though within a server such reallocations are commonplace in a private setting, the effect of such “live” reallocations may be that tasks need to be terminated and reassigned to other servers (or live migrated). The following illustrative numerical examples allocate a single initial framework batch (without framework churn). In the following emulation study for equal priority workloads and framework churn, we work with the default progressivefilling mechanism in Mesos wherein existing frameworks are not adjusted upon framework churn.
V Evaluation objectives: Task efficiency of maxmin fair allocations
In the following, though we aim for maxmin fairness with progressive filling, we are also interested in the proportional fairness achieved. We compare the efficiency (17) of the allocations achieved by progressive filling for examples with heterogeneous workloads and servers. In the performance evaluation of our Mesos implementations, efficiency is defined by overall execution time.
Though PSDSF allocations achieved by progressive filling may not be Pareto optimal, we show that they are more efficient, even in some of our Mesos experiments where servers are (at least initially) selected at random.
In the following, for brevity, we consider only cases with frameworks of equal priority () and without serverpreference constraints (i.e., ).
Vi Illustrative numerical study of fair scheduling by progressive filling
In this section, we consider the following typical example of our numerical study with two heterogeneous distributed application frameworks () having resource demands per unit workload:
(18) 
and two heterogeneous servers () having two different resources with capacities:
(19) 
For DRF and TSF, the servers are chosen in roundrobin fashion, where the server order is randomly permuted in each round; DRF under such randomized roundrobin (RRR) server selection is the default Mesos scheduler, cf. next section. One can also formulate PSDSF under RRR wherein RRR selects the server and the PSDSF criterion only selects the framework for that server. Frameworks are chosen by progressive filling with integervalued tasking (), i.e., whole tasks are scheduled.
Numerical results for scheduled workloads for this illustrative example are given in Tables II & III, and unused resources are given in Tables IV and V. 200 trials were performed for DRF, TSF and PSDSF under RRR server selection, so using Table III
we can obtain confidence intervals for the averaged quantities given in Table
II for schedulers under RRR. For example, the 95% confidence interval for task allocation of the first framework on the second server (i.e., ) under TSF isNote how PSDSF’s performance under RRR is comparable to when frameworks and servers are jointly selected [17]
, and with low variance in allocations. We also found that RRRrPSDSF performed just as rPSDSF over 200 trials.
sched.  (1,1)  (1,2)  (2,1)  (2,2)  total 
DRF [12, 32]  6.55  4.69  4.69  6.55  22.48 
TSF [31]  6.5  4.7  4.7  6.5  22.4 
RRRPSDSF  19.44  1.15  1.07  19.42  41.08 
BFDRF [32]  20  2  0  19  41 
PSDSF [17]  19  0  2  20  41 
rPSDSF  19  2  2  19  42 
sched.  (1,1)  (1,2)  (2,1)  (2,2) 

DRF [12, 32]  2.31  0.46  0.46  2.31 
TSF [31]  2.29  0.46  0.46  2.29 
RRRPSDSF  0.59  0.99  1  0.49 
Sample standard deviation of allocations
for different schedulers under RRR server selection with. Averaged values over 200 trials reported.sched.  (1,1)  (1,2)  (2,1)  (2,2) 

DRF [32]  62.56  0  0  62.56 
TSF [31]  62.8  0  0  62.8 
RRRPSDSF  1.8  4.6  4.86  1.92 
BFDRF [32]  0  10  1  3 
PSDSF [17]  3  1  10  0 
rPSDSF  3  1  1  3 
sched.  (1,1)  (1,2)  (2,1)  (2,2) 

DRF [12, 32]  11.09  0  0  11.09 
TSF [31]  10.99  0  0  10.99 
RRRPSDSF  0.59  0.99  1  0.49 
We found task efficiencies improve using residual forms of the fairness criterion. For example, the residual PSDSF (rPSDSF) criterion is
That is, this criterion makes scheduling decisions by progressive filling using current residual (unreserved) capacities based on the current allocations . From Table II, we see the improvement is modest for the case of PSDSF.
Improvements are also obtained by bestfit server selection. For example, bestfit DRF (BFDRF) first selects framework by DRF and then selects the server whose residual capacity most closely matches their resource demands [32].
Vii Summary and Future Work
For a privatecloud setting, we considered scheduling a group of heterogeneous, distributed frameworks to a group of heterogeneous servers. We extended two general results on maxmin fairness and proportional fairness to this case for a static problem under generic scheduling criteria. Subsequently, we assessed the efficiency of approximate maxmin fair allocations by progressive filling according to different fairness criteria. Illustrative examples in heterogeneous settings show that maxmin fair PSDSF and rPSDSF scheduling, are superior to DRF in terms of task efficiency performance (a metric related to proportional fairness) and that the efficiency of these “server specific” schedulers did not significantly suffer from the use of randomized roundrobin server selection. Task efficiency was also improved when either the “best fit” approach to selecting servers was used or the fairness criteria was modified to use current residual/unreserved resource capacities. We also opensource implemented oblivious (“coarse grained”) and workloadcharacterized (specified resource demands ) online prototypes of these schedulers on Mesos [22, 27], with the Mesos default/baseline being oblivious DRF. Using two different Spark workloads and heterogeneous servers, we showed that the schedulers were similarly ranked using the total execution time as the performance measure. Moreover, execution times could be shortened with workload characterization.
In future work, we will consider scheduling (admission control and placement) problems in a public cloud setting. To this end, note that similar objectives to those considered herein for a privatecloud setting, particularly (2), may be reinterpreted as overall revenue based on bids for virtual machines or containers with fixed resource allocations . Also, as profit margins diminish in a maturing marketplace, one expects that public clouds will need to operate with greater resource efficiency. Note that notions of fair scheduling and desirable properties of schedulers as defined in, e.g., [12, 11, 30] may not be relevant to the publiccloud setting, where the expectation is that different customers/frameworks simply “get what they pay for.” Moreover, in a public cloud setting, what the customers do with their virtual machines/containers is arguably not the concern of the cloud operator so long as the customer complies with service level agreements. But, e.g., notions of strategy proofness are important considerations in the design of auction [29] and spotpricing mechanisms (where under spot price mechanisms, virtual machines or containers may be revoked).
References
 [1] T. F. Abdelzaher, K. G. Shin, and N. Bhatti. Performance guarantees for web server endsystems: A controltheoretical approach. IEEE Trans. Parallel Distrib. Syst., 13(1):80–96, 2002.
 [2] D. Bertsekas and R. Gallager. Data Networks, 2nd Ed. Prentice Hall, 1992.
 [3] A. Chandra, W. Gong, and P. Shenoy. Dynamic resource allocation for shared data centers using online measurements. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’03, 2003.
 [4] C. Chekuri and S. Khanna. On multidimensional packing problems. SIAM Journal of Computing, 33(4):837–851, 2004.
 [5] M. Chowdhury, Z. Liu, A. Ghodsi, and I. Stoica. HUG: Multiresource fairness for correlated and elastic demands. In Proc. USENIX NSDI, March 2016.
 [6] H. Christensen, A. Khan, S. Pokutta, and P. Tetali. Multidimensional Bin Packing and Other Related Problems: A Survey. https://people.math.gatech.edu/tetali/PUBLIS/CKPT.pdf, 2016.
 [7] I. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J. S. Chase. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation  Volume 6, OSDI’04, 2004.
 [8] M. Cohen, V.Mirrokni, P. Keller, and M. Zadimoghaddam. Overcommitment in Cloud Services Bin packing with Chance Constraints. In Proc. ACM SIGMETRICS, UrbanaCampaign, IL, June 2017.
 [9] R. P. Doyle, J. S. Chase, O. M. Asad, W. Jin, and A. M. Vahdat. Modelbased resource provisioning in a web service utility. In Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems  Volume 4, USITS’03, 2003.
 [10] Duke utility bill tariff, 2012. http://www.considerthecarolinas.com/pdfs/scscheduleopt.pdf.
 [11] E. Friedman, A. Ghodsi, and C.A. Psomas. Strategyproof allocation of discrete jobs on multiple machines. In Proc. ACM Conf. on Economics and Computation, 2014.
 [12] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In Proc. USENIX NSDI, 2011.
 [13] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Finegrained Resource Sharing in the Data Center. In Proc. USENIX NSDI, 2011.
 [14] C. JoeWong, S. Sen, T. Lan, and M. Chiang. Multiresource allocation: Fairnessefficiency tradeoffs in a unifying framework. IEEE/ACM Trans. Networking, 21(6), Dec. 2013.
 [15] J. KhamseAshari, G. Kesidis, I. Lambadaris, B. Urgaonkar, and Y. Zhao. Constrained MaxMin Fair Scheduling of VariableLength PacketFlows to Multiple Servers. In Proc. IEEE GLOBECOM, Washington, DC, Dec. 2016.
 [16] J. KhamseAshari, I. Lambadaris, G. Kesidis, B. Urgaonkar, and Y. Zhao. An Efficient and Fair MultiResource Allocation Mechanism for Heterogeneous Servers. http://arxiv.org/abs/1712.10114, Dec. 2017.
 [17] J. KhamseAshari, I. Lambadaris, G. Kesidis, B. Urgaonkar, and Y. Zhao. An Efficient and Fair MultiResource Allocation Mechanism for Heterogeneous Servers. IEEE Trans. Parallel and Distributed Systems (TPDS), May 2018.
 [18] J. KhamseAshari, I. Lambadaris, G. Kesidis, B. Urgaonkar, and Y. Zhao. PerServer DominantShare Fairness (PSDSF): A MultiResource Fair Allocation Mechanism for Heterogeneous Servers. https://arxiv.org/abs/1611.00404, Nov. 1, 2016.
 [19] R. Levy, J. Nagarajarao, G. Pacifici, M. Spreitzer, A. Tantawi, and A. Youssef. Performance management for cluster based web services. In G. Goldszmidt and J. Schönwälder, editors, Integrated Network Management VIII: Managing It All, pages 247–261. Springer US, 2003.
 [20] C. Lu, T. F. Abdelzaher, J. A. Stankovic, and S. H. Son. A feedback control approach for guaranteeing relative delays in web servers. In Proceedings of the Seventh RealTime Technology and Applications Symposium, RTAS ’01, 2001.
 [21] D. A. Menasce. Web server software architectures. IEEE Internet Computing, 7(6):78–81, 2003.
 [22] Mesos multischeduler. https://github.com/PSUCloud/mesosps/pull/1/files.
 [23] J. Mo and J. Walrand. Fair endtoend windowbased congestion control. IEEE/ACM Trans. Networking, Vol. 8, No. 5:pp. 556–567, 2000.
 [24] M. N. Bennani and D. A. Menasce. Resource allocation for autonomic data centers using analytic performance models. In Proceedings of the Second International Conference on Automatic Computing, ICAC ’05. IEEE Computer Society, 2005.
 [25] H. Seltman. Approximation of mean and variance of a ratio. http://www.stat.cmu.edu/ hseltman/files/ratio.pdf.
 [26] Y. Shan, A. Jain, G. Kesidis, B. Urgaonkar, J. KhamseAshari, and I. Lambadaris. Online Scheduling of Spark Workloads with Mesos using Different Fair Allocation Algorithms. https://arxiv.org/abs/1803.00922, March 2, 2018.
 [27] Spark with HeMT. https://github.com/PSUCloud/sparkhemt/pull/2/files.
 [28] A. Stuart and K. Ord. Kendall’s Advanced Theory of Statistics. Arnold, London, 6th edition, 1998.
 [29] VickreyClarkeGroves auction. https://en.wikipedia.org/wiki/VickreyClarkeGroves_auction.
 [30] W. Wang, B. Li, B. Liang, and J. Li. Towards multiresource fair allocation with placement constraints. In Proc. ACM SIGMETRICS, Antibes, France, 2015.
 [31] W. Wang, B. Li, B. Liang, and J. Li. Multiresource fair sharing for datacenter jobs with placement constraints. In Proc. Supercomputing, Salt Lake City, Utah, 2016.
 [32] W. Wang, B. Liang, and B. Li. Multiresource fair allocation in heterogeneous cloud computing systems. IEEE Transactions on Parallel and Distributed Systems, 26(10):2822–2835, Oct. 2015.
 [33] W. Xu, P. Bodik, and D. Patterson. A flexible architecture for statistical learning and data mining from system log streams. In Proceedings of Workshop on Temporal Data Mining: Algorithms, Theory and Applications at the Fourth IEEE International Conference on Data Mining, Brighton, UK, 2004.
 [34] K.K. Yap, T.Y. Huang, Y. Yiakoumis, S. Chinchali, N. McKeown, and S. Katti. Scheduling packets over multiple interfaces while respecting user preferences. In Proc. ACM CoNEXT, Dec. 2013.
Appendix A: Proof of Proposition 1
Define the Lagrangian to be maximized over and over Lagrange multipliers :
The firstorder optimality condition,
(20) 
and strictly increasing imply
(21) 
So, , s.t. . Thus, complementary slackness is
(22)  
(23) 
i.e., in every server , one resource (which may depend on ) is fully booked. So, the set of fully booked resources in server under allocations can be characterized by . Now by (20) and assumed strict concavity of , uniquely
Now consider two frameworks and and server such that and . So, complementary slackness
(24) 
implies .
Appendix B: Proof of Proposition 2
The Lagrangian here is
where, again, the Lagrange multipliers . A firstorder optimality condition is
(25)  
Comments
There are no comments yet.