The Supermarket Model with Known and Predicted Service Times

05/23/2019 ∙ by Michael Mitzenmacher, et al. ∙ Harvard University 0

The supermarket model typically refers to a system with a large number of queues, where arriving customers choose d queues at random and join the queue with fewest customers. The supermarket model demonstrates the power of even small amounts of choice, as compared to simply joining a queue chosen uniformly at random, for load balancing systems. In this work we perform simulation-based studies to consider variations where service times for a customer are predicted, as might be done in modern settings using machine learning techniques or related mechanisms. To begin, we start by considering the baseline where service times are known. We find that this allows for significant improvements. In particular, not only can the queue being joined be chosen based on the total work at the queue instead of the number of jobs, but also the jobs in the queue can be served using strategies that take advantage of the service times such as shortest job first or shortest remaining processing time. Such strategies greatly improve performance under high load. We then examine the impact of using predictions in place of true service times. Our main takeaway is that using even seemingly weak predictions of service times can yield significant benefits over blind First In First Out queueing in this context. However, some care must be taken when using predicted service time information to both choose a queue and order elements for service within a queue; while in many cases using the information for both choosing and ordering is beneficial, in many of our simulation settings we find that simply using the number of jobs to choose a queue is better when using predicted service times to order jobs in a queue. Our study leaves many natural open questions for further work.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The success of machine learning has opened up new opportunities in terms of improving the efficiency of a wide array of processes. In this paper, we consider opportunities for using machine learning predictions in a specific setting: queueing in large distributed systems using “the power of two choices”. This also leads us to consider variants of these systems that appear to have not previously studied that do not use predictions as a starting point. Our study here is simulation-based, but leads to several new open theoretical questions.

We start with key background. The supermarket model (also described as the power of two choices, or balanced allocations) in queueing settings is typically described in the following way. Suppose we have a system of First In, First Out (FIFO) queues. Jobs111For the paper we use jobs instead of the more specific term customers, as the model applies to a variety of load-balancing settings. arrive to the system as a Poisson process of rate

, and service times are independent and exponentially distributed with mean 1. If each job selects a random queue on arrival, then via Poisson splitting

[19][Section 8.4.2] each queue acts as a standard M/M/1 queue, and in equilibrium the fraction of queues with at least jobs is . Note that we consider here the tails of the queue length distribution, as it makes for easier comparisons. If each job selects a two random queues on arrival, and chooses to wait at the queue with fewer customers (breaking ties randomly), then in the limiting system as grows to infinity, in equilibrium the fraction of queues with at least jobs is . That is, the tails decrease doubly exponentially in , instead of single exponentially. In practice, even for moderate values of (say in the large hundreds), one obtains performance close to this mean field limit; this also follows from theoretical considerations based on concentration bounds. More generally, for choices where is an integer constant grater than 1, the fraction of queues with at least jobs falls like [16, 24].

In this paper, we provide a simulation-based study to consider variations of the supermarket model where service times are predicted, as might be done in modern settings using machine learning or related mechanisms. To describe our work and goals, we start by considering the baseline where service times are known.

The analysis for the basic supermarket model described above assumes that service times are exponentially distributed but specific job service times are not known. (Extensions to other distributions are known [1, 4].) As such, an incoming job uses only the number of jobs at each chosen queue to decide which queue to join. As both a theoretical question and for possible practical implementations, it seems worthwhile to know what further improvement is possible if service times of the jobs were known.

Recently, Hellmans and Van Houdt worked on this problem, in the supermarket model setting where job reservations are made at randomly chosen queues, and once the first reservation reaches the point of obtaining service, the other reservations are cancelled. This corresponds to choosing the least loaded (in terms of total remaining service time222We use service time and processing time interchangably in this paper; both terms have been used historically.) of queues using FIFO queues. Their work applies to general service distributions; for the class of phase-type service distributions, they are able to express the limiting behavior of the system in terms of delayed differential equations [8]. Their results, including theorems regarding the system behavior as well as simulations, show that using service time information can lead to significant improvements in the average time a job spends in the system.

However, when the service times are known, there are two possible ways to potentially improve performance. First, one can use the service times when selecting a queue, by choosing the least loaded queue.Second, one can order the jobs using a strategy other than FIFO; the natural strategies to minimize the average time in the system are shortest job first (SJF), premptive shortest job first (PSJF), and shortest remaining processing time (SRPT). Here shortest job first assumes no preemption and always schedules the job with the smallest service time when a job completes; preemptive shortest job first allows preemption so that a job with a smaller service time can preempt the running job, and shortest remaining processing time allows preemption but is based on the remaining processing time instead of the total service time for a job. Note that here we assume a preempted job does not need to start from the beginning and can later continue service where it left off.

In the setting of a single queue, Mitzenmacher has recently considered the setting where service time are predicted rather than known exactly [17]. In this model, the jobs have a joint service-predicted service density function , where is the true service time and is the predicted service time. He provides formulae for the average time in the system using corresponding strategies shortest predicted job first (SPJF), preemptive shortest predicted job first (PSPJF), and shortest predicted remaining processing time (SPRPT). Simulation results suggest that in the single queue setting even weak predictors can greatly improve performance over FIFO queues. However, using the power of two choices already provides great improvements in systems with multiple queues. It is natural to consider whether predictions would still provide significant performance gains in the supermarket model.

The contributions of this paper include the following.

  • For the case of known service times, we provide a simulation study showing the potential gains when using SJF, PSJF, and SRPT queues in the supermarket model.

  • We similarly through simulations examine the benefits of when only predicted information is available, using FIFO, SPJF, PSPJF, and SPRPT queues.

  • We provide a number of open questions related to the analysis and use of these systems.

Ii Additional Related Work

The power of two choices was first analyzed in the discrete settings of hashing, modelled as balls and bins processes [2, 10, 14]. It was subsequently analyzed in the setting of queueing systems, in particular in the mean field limit (also referred to as the fluid limit) as the number of queues grows to infinity [16, 16].

Ordering jobs by service time has been studied extensively in single queues. The text [7] provides a good introduction to the analysis of standard approaches such as SJF and SPRT in the single queue setting.

Our work falls into a recent line of work that aims to use machine learning predictions to improve traditional algorithms. For example, Lykouris and Vassilvitskii [12] show how to use prediction advice from machine learning algorithms to improve online algorithms for caching in a way that provides provable performance guarantees, using the framework of competitive analysis. Other recent works with this theme include the development of learned Bloom filters [11, 18] and heavy hitter algorithms that use predictions [9]. One prior work in this vein has specifically looked at scheduling with predictions in the setting of a fixed collection of jobs, and consider variants of shortest predicted processing time that yield good performance in terms of the competitive ratio, with the performance depending on the accuracy of the predictions [21].

In scheduling of queues, some works have looked at the effects of using imprecise information, usually for load balancing in multiple queue settings. For example, Mitzenmacher considers using old load information to place jobs (in the context of the power of two choices) [15]. A strategy called TAGS studies an approach to utilizing multiple queues when no information exists about the service time; jobs that run more than some threshold in the first queue are cancelled and passed to the second queue, and so on [6]. For single queues, recent work by Scully and Harchol-Balter have considered scheduling policies that are based on the amount of service received, where the scheduler only knows the service received approximately, subject to adversarial noise, and the goal is to develop robust policies [22]. Also, for single queues, prediction-based policies appear to fit within the more general framework of SOAP policies presented by Scully et al. [23]. Our work differs from these past works, in providing a model specifically geared toward studying performance with machine-learning based predictions in the context of the supermarket model.

Ii-a Review: The Supermarket Model

In the supermarket model, as originally described [16], customers arrive as a Poisson stream of rate , where is a constant, to a collection of servers. Each customer chooses servers independently and uniformly at random from the servers for some fixed constant . (The choices may be made with or without replacement; as grows large the difference vanishes asymptotically.) The customer waits for service at the server from these choices with the fewest customers, ties being broken arbitrarily. (For more complex variations in how queues might be chosen, see for example [20].) Customers are served according to the first-in first-out (FIFO) protocol. In the original description the analysis was done for the case where the service time for a customer is exponentially distributed with mean 1, but other service time distributions can be considered. Typically the aim is to find the equilibrium distribution of the queue lengths, or some corresponding property of the equilibrium distribution (such as the expected time an arriving customer will spend in the system) in the limit as goes to infinity. In the basic setting, as well as in many related cases, the system can be modelled by and the equilibrium distribution can be determined by a corresponding set of differential equations [16, 24].

Iii Known Service Times

Iii-a Scheduling Beyond FIFO

While the work of [8] shows that the equilibrium distribution when service times are known can be determined when choosing the queue according to the least loaded and using FIFO scheduling at each queue, if the service times are known, there are other possibilities. In particular, one can use shortest job first (SJF) or shortest remaining processing time (SRPT) at each individual queue. Here SPRT allows preemption of the job being served by an incoming job at no cost, while SJF does not. We also consider preemptive shortest job first (PSJF), where preemption is based on the service time of the job rather than the remaining service time of the job. Although somewhat less natural, preemptive shortest job first allows job priorities to be assigned on arrival to a queue without the need for updates.

While primarily in this paper we are intersted in the performance of the supermarket model with predicted service times, as these variations do not appear to have been studied, we provide results as a baseline for our later results.

In all of the simulation experiments we present, we simulate 1000 initially empty queues over 10000 units of time, and take the average time in the system for all jobs that terminate after time 1000 and before time 10000. We then take the average of this value over 100 simulations. Waiting for the first 1000 time units allows the system to near equilibrium. Variations of the supermarket model have a limiting equilibrium as the number of queues goes to infinity [4]

, and in practice we find 1000 queues provides an accurate estimate of the limiting behavior. In the experiments we focus on two example service distributions: exponential with mean 1, and a Weibull distribution with cumulative distribution

. (The Weibull distribution is more heavy-tailed, but also has mean 1.) Arrivals are Poisson with arrival rate ; we focus on results with , as for smaller arrival rates all our proposed schemes perform very well and it becomes difficult to see performance differences. Unless otherwise noted in the simulations each job chooses queues at random. Studying the detailed effects of larger across the many variations we study is left for future work.

(a) Exponential service times, queue chosen by shortest queue
(b) Exponential service times, queue chosen by least loaded
Fig. 1: Exponential service times, two choice supermarket model, with various queue scheduling policies.
(a) Weibull service times, queue chosen by shortest queue
(b) Weibull service times, queue chosen by least loaded
Fig. 2: Weibull service times, two choice supermarket model, with various queue scheduling policies.

Figure 1a shows the results where the shortest queue is chosen (ties broken randomly), while Figure 1b shows the results where the least loaded queue is chosen, for exponential service times. Figures 2a and 2b present the results for the Weibull distributed service times. Generally, we see that the results from using the known service times to order jobs at the queue is very powerful; indeed, the gain from using SRPT appears larger than the gain from moving from shortest queue to least loaded, and similarly the gain from using SJF and PSJF is larger under high enough loads.

As the charts make it difficult to see some important details, we present numerical results for exponential distributions in Table I to mark some key points. While generally the benefits from using the service times to both choose the queue and order the queue are complementary, this is not always the case. We see that using least loaded rather than shortest queue when using PSJF can increase the average time in the system under suitably high load. (This also occurs with the Weibull distribution under sufficiently high loads.) We also see that using PSJF can give worse performance than using SJF; however, this does not happen with our experiments with Weibull distribution, where the ability of preemption to help avoid waiting for long-running jobs appears to be more helpful. While it is known that PSJF can behave worse than SJF, these examples highlight that the interactions when using service time information in multiple choice systems must be treated carefully.

FIFO SJF PSJF SRPT FIFO SJF PSJF SRPT
SQ SQ SQ SQ LL LL LL LL
0.5 1.2658 1.2585 1.1669 1.1337 1.1510 1.1460 1.1462 1.0973
0.6 1.4078 1.3857 1.2527 1.2020 1.2401 1.2280 1.2289 1.1518
0.7 1.6148 1.5567 1.3726 1.2962 1.3749 1.3467 1.3490 1.2307
0.8 1.9485 1.7997 1.5542 1.4367 1.5975 1.5297 1.5371 1.3533
0.9 2.6168 2.2054 1.8850 1.6873 2.0534 1.8634 1.8915 1.5783
0.95 3.3923 2.5903 2.2248 1.9408 2.5852 2.1999 2.2685 1.8096
0.98 4.5384 3.0618 2.6721 2.2614 3.3798 2.6197 2.7807 2.1038
0.99 5.4855 3.3856 2.9959 2.4903 4.0451 2.9137 3.1696 2.3176
TABLE I: Results from choosing from the shortest queue (SQ) compared with choosing the least loaded (LL).

Iii-B Choosing a Queue

Given the improvements possible using known service times in the supermarket, we now consider methods for choosing a queue beyond the queue with the least load. Given full information about the service times of jobs at each queue, a job could be placed so that it minimizes the additional waiting time. The additional waiting time when placing an arriving job is the sum of the remaining service times of all jobs in the queue that will remain ahead of the arriving job, summed with the product of the service time of the arriving job and the number of jobs it will be placed ahead of. Equivalently, we can consider the total waiting time for each queue before and after the arriving job would be placed (ignoring the possibility of future jobs), and place the item in the queue that leads to the smallest increase.

Alternatively, if control is not centralized, we might consider selfish jobs, that seek only to minimize their own waiting time when choosing a queue. In this case the arriving job will consider the sum of the remaining service times of all jobs that will be ahead of it for each available queue choice.

Our results, given in Figures 3a and 3b, show that choosing a queue to minimize the additional waiting time in these situations does yield a small improvement over least loaded SRPT, as might be expected. Because the additional improvement is small, we expect in many systems it may not be worthwhile to implement this modification, even if expected waiting time is the primary performance metric. Our results also show that while selfish jobs have a significant negative effect, the overall average service time still remains smaller than the standard supermarket model when choosing the shorter of two FIFO queues.

(a) Exponential service times, queue choice methods
(b) Weibull service times, queue choice methods
Fig. 3: Comparing methods of choosing a queue. All queues use SRPT within the queue; in the figure, SRPT means each job chooses the queue with smallest remaining work, SELFISH means each job chooses the queue that minimizes its waiting time, and MIN-ADD means each job chooses the queue that minimizes the additional waiting time added.

Iii-C Toward Developing Equations for Limiting Behavior

Previous work has shown that, in the limiting supermarket model where the number of queues goes to infinity, individual queues can be treated as independent, both when the choosing shortest queue and when choosing the least loaded [5]. The analysis of choosing the least loaded queue under FIFO of [8] therefore yields the the equilibrium distribution for the queue load. Further, we can conclude that that in equilibirum, at each queue considered in isolation, the least loaded variant of the supermarket model has a load-dependent arrival process, given by a Poisson process of rate when the queue has service load . (See, for example, [3, 13] for more on queues with load-dependent arrival processes; note here the arrival rate depends on the workload, not the total number of jobs in the queue.) The least loaded variant of the supermarket model when using other scheduling schemes, such as SJF, PSJF, and SRPT, would have the same load-dependent arrival process, as in equilibrium the workload distribution would be the same regardless of the scheduling scheme. Hence we could develop formulae for quantities such as the expected time in the system in equilibrium in the supermarket model using the least loaded queue and SRPT, if we can develop an analysis of a single queue using SRPT with a load-dependent arrival process (and similarly for other scheduling schemes). We are not aware of any such analysis in the literature; this is a natural and tantalizing open question.

We note that the supermarket model when jobs choose the shortest queue also, as far as we know, has not been analyzed for SJF, PSJF, and SRPT. Here the arrival process at a queue in equilibrium can be given by a Poisson process of rate when the queue has jobs waiting. Again, if we can develop an analysis of a single queue using SRPT with a queue-length-dependent arrival process, we can use this to analyze the supermarket model uisng SRPT (and similarly for other scheduling schemes).

Iv Predicted Service Times

In many settings, it may be unreasonable to expect to obtain exact service times, but predicted service times may be available. Indeed, with advances in machine learning techniques, we expect that in many settings some type of prediction will be available. As discussed in [17], in the context of scheduling within a single queue, one would expect that even weak predictors may be very useful, since ordering jobs correctly most of the time will produce significant gains. As we have seen, however, even without predictions the question of whether using load information for both choosing a queue and for ordering within a queue provides complementary gains is not always clear. Naturally, the same question will arise again when using predicted service times.

Iv-a The Prediction Model

We note that in discussions below, we may assume as a simple model (used in [17]

) that there is a continuous joint distribution

for the actual service time and predicted service time . (One can modify the model to account for discrete joint distributions; we choose this model for ease of exposition.)

To begin, we note that there is an issue of how to describe the predicted remaining service time. Suppose that the original predicted service time for a job is , but the actual service time is . If the amount of service is being tracked, and the service received has been , then as the remaining service time is , it is natural to use as the predicted remaining service time. Of course, at some point we will have , and the predicted remaining service time will be negative, which seems unsuitable.

Here we simply use as the predicted remaining service time. We recognize that this remains problematic; clearly the predicted remaining service time should be positive, and ideally would be a function of the initial prediction and the time served thus far. However, determining the appropriate function would appear to require some knowledge of the joint distribution ; our aim here is to explore simple, general approaches (such as choosing the shortest of two queues and using SRPT) that are agnostic to the underlying distribution . In many situations, it may be computationally undesirable to utilize knowledge of , or may be not known or changing over time. We therefore leave the question of how to optimize the estimate of the predicted remaining time to achieve the best performance in this context as future work.

We consider various models for predictions (some of which were used in [17]). The models are intended to be exemplary; they do not match a specific real-world setting, and indeed it would be difficult to consider the range of possible real-world predicitons. Rather, they are meant to show generally that even moderately accurate predictions can yield strong performance, and to show that a variety of interesting behaviors can occur under this framework.

In one model, which we refer to as exponential predictions, a job with actual service time has a predicted service time that is exponentially distributed with mean . This model is not meant to accurately represent a specific situation, but is potentially useful for theoretical analysis in that the corresponding density equation is easy to write down, and it highlights how even noisy predictions can perform well. Also, exponential service times are a standard first consideration in queueing theory. In another model, which we refer to as -predictions, a job with service time has a predicted service time that is uniform over , for a scale parameter . Again, this is a simple model that captures inaccurate estimates naturally. Finally, we introduce a model that we dub -predictions, which makes use of the following notion of a reversal

. For a service distribution with be the cumulative distribution function

, the reversal of is . For example, if is the value that is at the 70th percentile of the distribution, the reversal is the value at the 30th percentile of the distribution. For an -prediction, when the service time is

, with probability

we return the reversal of , and with all remaining probability the predicted service time is uniform over . We use this model to represent cases where severe mispredictions are possible, so that jobs with very large service times might be mistakenly predicted as having very small service times (and vice versa). We would expect such mispredictions could be potentially very problematic when scheduling jobs according to their predicted service times.

There further remains the question of how to account for the predicted workload at a queue. We discuss several variations.

  1. Least Loaded Total: One could simply treat the predicted service times as actual service times, and track the total predicted service time remaining at a queue. That is, when a new job arrives at a queue, the predicted service time for the job is added to the total, and the total predicted service time reduces at a rate of one unit per unit time when a job is in the system; when the queue empties, the total predicted service time is reset to 0. An advantage of this approach is that in implementation the queue state can be represented by a single number. The disadvantage is that when a job’s predicted service time differs greatly from the real service time, this approach does not correspondingly update when that job completes.333A theoretical advantage of this approach is that the queue state can be represented by a pair of values, corresponding to the total actual service time for jobs in the queue and the total predicted service times in the queue. The approach of [8] can then be used to obtain equations describing the process, using a two-dimensional state, assuming FIFO service, allowing equations giving the equilibrium distribution. We leave further theoretical considerations for later work.

  2. Least Loaded Updated: Here one updates the queue state both on a job arrival and a job completion; when a job completes, the predicted service time at the queue is recomputed as the sum of the predicted service times of the remaining jobs. With small additional complexity, the accuracy of the predicted work at the queue improves substantially.

  3. Shortest Queue: One can always simply use the number of jobs rather than the predicted service time to choose the queue.

Iv-B Scheduling with Predictions

We begin as before by first considering the effect of the choice of scheduling procedure within a queue, by examining results for FIFO, shortest predicted job first (SPJF), preemptive shortest predicted job first (PSPJF), and shortest predicted remaining processing time (SPRPT) in various settings. Our figures consider the least loaded updated and shortest queue variations described above (as the least loaded total variation generally performs significantly worse, as we see in the next subsection). We again consider exponential and Weibull distributed service times as previously.

Our results, shown in Figures 4a, 4b, 5a, and 5b, already show two key points: predicted service times can work quite well, but there are interesting behaviors that may be suprising. First, choosing the shortest queue generally performs better than choosing the least loaded according to the predicted service times of jobs in the queue; for this set of experiments, only with Weibull distributed service times and FIFO service does using the predicted load in the queue better than using just the number of jobs. That is, the load prediction often predicts performance worse than the number of jobs when using strategies within the queue that utilize the predicted information. Hence, even in this very simple case, we see that using predicted information for multiple subtasks (choosing a queue, and balancing within a queue) can lead to worse performance than simply using the information for one of the subtasks.

Second, PSPJF performs better than SPRPT on the Weibull distribution. On reflection, this seems reasonable from first principles; a long job that is incorrectly predicted to have a small remaining processing time can lead to increased waiting times for many jobs under SPRPT, but preempting based on the initial prediction of the job time ameliorates this effect.

(a) Exponential service times, queue chosen by least loaded update
(b) Exponential service times, queue chosen by shortest queue
Fig. 4: Exponential predictions with exponential service times, two choice supermarket model, with various queue scheduling policies.
(a) Weibull service times, queue chosen by least loaded update
(b) Weibull service times, queue chosen by shortest queue
Fig. 5: Exponential predictions with Weibull service times, two choice supermarket model, with various queue scheduling policies.

We now examine results in the setting of -predictions. We first look at the case of SPRPT; results for other schemes have similar characteristics. We compare SRPT (no prediction) with SPRPT for , and , both using the least loaded update and shortest queue policies. The results appear in Figures 6a, 6b, 7a, and 7b. The primary takeaway is that again using predictions offers what is arguably surprisingly little loss in performance, even at large values of . Here, we find that least loaded does better than shortest queue for small values of , but for and high arrival rates shortest queue can perform slightly better. This is consistent with our results for the exponential model.

(a) Exponential service times, queue chosen by least loaded update
(b) Exponential service times, queue chosen by shortest queue
Fig. 6: -predictions with exponential service times, two choice supermarket model, using SPRPT.
(a) Weibull service times, queue chosen by least loaded update
(b) Weibull service times, queue chosen by shortest queue
Fig. 7: -predictions with Weibull service times, two choice supermarket model, using SPRPT.

We also look at the case of PSPJF in Figures 8a, 8b, 9a, and 9b. Performance is somewhat worse than for SPRPT, and the effect of increasing is somewhat larger. Here, we find that joining the shortest queue generally does better than joining the least loaded queue. Again, this is consistent with our results for the exponential model. Overall the picture remains very similar.

(a) Exponential service times, queue chosen by least loaded update
(b) Exponential service times, queue chosen by shortest queue
Fig. 8: -predictions with exponential service times, two choice supermarket model, using PSPJF.
(a) Weibull service times, queue chosen by least loaded update
(b) Weibull service times, queue chosen by shortest queue
Fig. 9: -predictions with Weibull service times, two choice supermarket model, using PSPJF.

For completeness we also provide results using SPJF in Figures 10a, 10b, 11a, and 11b. Here SPJF generally performs worse than SPRPT and PSPJF; however, the effect on performance as increases rises even more slowly with . In these experiments, using least loaded always performed better than choosing the shortest queue.

(a) Exponential service times, queue chosen by least loaded update
(b) Exponential service times, queue chosen by shortest queue
Fig. 10: -predictions with exponential service times, two choice supermarket model, using SPJF.
(a) Weibull service times, queue chosen by least loaded update
(b) Weibull service times, queue chosen by shortest queue
Fig. 11: -predictions with Weibull service times, two choice supermarket model, using SPJF.

Finally, we consider the case of -predictions. Here we present an example of with , and , comparing also with the results from -prediction when = 0.5. Recall that in this setting with probability a job’s service time is replaced by its reversal in the cumulative distribution function, so that jobs with very large service times might be mistakenly predicted as having very small service times (and vice versa). The remaining jobs have predictions uniform over when the true service time is . The results are given in Figures 12a, 12b, 13a, and 13b. The primary takeaway is again that performace is quite robust to mispredictions. Even when , performance in all cases is significantly better than for standard choosing the shortet of two queues and using FIFO queueing without knowledge of service times. We also see now familiar trends. The effects of misprediction are more significant for the heavy-tailed service times, and when mispredictions are sufficiently frequent, it becomes better to choose a queue according to the shortest queue rather than according to the least loaded update policy.

(a) Exponential service times, queue chosen by least loaded update
(b) Exponential service times, queue chosen by shortest queue
Fig. 12: -predictions with exponential service times, two choice supermarket model, using SPRPT.
(a) Weibull service times, queue chosen by least loaded update
(b) Weibull service times, queue chosen by shortest queue
Fig. 13: -predictions with Weibull service times, two choice supermarket model, using SPRPT.

Iv-C Choosing a Queue with Predictions

As before, we consider methods for choosing a queue beyond the queue with the (predicted) least load. We consider placing a job so that it minimizes the additional predicted waiting time, based on the predicted waiting times for all jobs. Alternatively, if control is not centralized, we might consider selfish jobs, that seek only to minimize their own waiting time when choosing a queue.

Our results, in Figures 14 and 15 below, focus on two representative examples: -predictions with , and -predictions with . Again, choosing a queue to minimize the additional predicted waiting time in these situations does yield a small improvement over least loaded update with SPRPT, and selfish jobs have a significant negative effect.

(a) Exponential service times, queue choice methods
(b) Weibull service times, queue choice methods
Fig. 14: Comparing methods of choosing a queue when using predicted service times, for -predictions with . All queues use SPRPT within the queue; in the figure, SPRPT means each job chooses the queue with smallest predicted remaining work, SELFISH-P means each job chooses the queue that minimizes its own waiting time according to predictions, and MIN-ADD-P means each job chooses the queue that minimizes the additional waiting time added according to predictions.
(a) Exponential service times, queue choice methods
(b) Weibull service times, queue choice methods
Fig. 15: Comparing methods of choosing a queue when using predicted service times, for -predictions with . All queues use SPRPT within the queue; in the figure, SPRPT means each job chooses the queue with smallest predicted remaining work, SELFISH-P means each job chooses the queue that minimizes its own waiting time according to predictions, and MIN-ADD-P means each job chooses the queue that minimizes the additional waiting time added according to predictions.

Finally, as discussed earlier we note that there is a signficant difference between Least Loaded Updated and Least Loaded Total policies. Up to this point, we have used “least loaded” to refer to Least Loaded Updated, where the predicted service time at the queue is recomputed after each departure and arrival. In contrast, Least Loaded Total tracks a single predicted service time for the queue that is updated on arrival but not at departure (unless a queue empties, in which case the service is reset to 0). While theoretically appealing (as it reduces the state space for the system), Least Loaded Total generally performs significantly worse than Least Loaded Updated. Figure 16 below provides a representative example, in the setting of -predictions when . We see FIFO, in particular, does quite poorly under Least Loaded Total, and in all cases, the gap in performance increases with the load. Our ohter experiments show that the gap in performance also increases significantly as the predictions become more inaccurate; with exponential predictions, -predictions with higher , or -predictions with and , our simulations show even larger gains from using Least Loaded Updated.

(a) Exponential service times, queue choice methods
(b) Weibull service times, queue choice methods
Fig. 16: Comparing variations of Least Loaded, for -predictions with .

V Conclusion

We have considered (through simulation) the supermarket model in the setting where service times are predicted or known. Where service times are known, our results show that in the “standard” supermarket model (exponential service times, Poisson arrivals) as well as more generally, even though the power of two choices provides tremendous gains on its own, substantial further performance gains can be achieved when one can make use of known service times. This raises theoretical questions, such as deriving equations for the supermarket model using least loaded server selection with shortest job first or shortest remaining processing time, which would extend the recent work of [8].

Even more importantly, we introduce the idea of using predicted service times in this setting. Our simulation-based study suggests that the power of two chains maintains most of its power even when using predictions. We also find some interesting effects, such as under sufficiently inaccurate predictions it can be better to use queue lengths rather than the predicted load when choosing a queue. We view these results as quite promising regarding the use of predicted service times in large-scale distributed systems. Our work highlights many open practical questions on how to optimize these kinds of systems when using predictions, as well as many open theoretical questions regarding how to analyze these kinds of systems.

References

  • [1] Reza Aghajani, Xingjie Li, and Kavita Ramanan. Mean-field dynamics of load-balancing networks with general service distributions. arXiv preprint arXiv:1512.05056, 2015.
  • [2] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. Balanced allocations. SIAM J. Comput., 29(1):180–200, 1999.
  • [3] René Bekker, Sem C Borst, Onno J Boxma, and Offer Kella. Queues with workload-dependent arrival and service rates. Queueing Systems, 46(3-4):537–556, 2004.
  • [4] Maury Bramson, Yi Lu, and Balaji Prabhakar. Randomized load balancing with general service time distributions. In ACM SIGMETRICS Performance Evaluation Review, volume 38, pages 275–286. ACM, 2010.
  • [5] Maury Bramson, Yi Lu, and Balaji Prabhakar. Asymptotic independence of queues under randomized load balancing. Queueing Systems, 71(3):247–292, 2012.
  • [6] Mor Harchol-Balter. Task assignment with unknown duration. J. ACM, 49(2):260–288, 2002.
  • [7] Mor Harchol-Balter. Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, 2013.
  • [8] Tim Hellemans and Benny Van Houdt. On the power-of-d-choices with least loaded server selection. POMACS, 2(2):27:1–27:22, 2018.
  • [9] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. Learning-based frequency estimation algorithms. International Conference on Learning Representations, 2019.
  • [10] Richard M. Karp, Michael Luby, and Friedhelm Meyer auf der Heide. Efficient PRAM simulation on a distributed memory machine. Algorithmica, 16(4/5):517–542, 1996.
  • [11] Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data, pages 489–504. ACM, 2018.
  • [12] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of JMLR Workshop and Conference Proceedings, pages 3302–3311. JMLR.org, 2018.
  • [13] Raymond Marie. Calculating equilibrium probabilities for (n)/c k/1/n queues. ACM Sigmetrics Performance Evaluation Review, 9(2):117–125, 1980.
  • [14] Michael Mitzenmacher. Studying balanced allocations with differential equations. Combinatorics, Probability & Computing, 8(5):473–482, 1999.
  • [15] Michael Mitzenmacher. How useful is old information? IEEE Trans. Parallel Distrib. Syst., 11(1):6–20, 2000.
  • [16] Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst., 12(10):1094–1104, 2001.
  • [17] Michael Mitzenmacher. Scheduling with predictions and the price of misprediction. arXiv preprint arXiv:1902.00732, 2019.
  • [18] Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. In Advances in Neural Information Processing Systems, pages 462–471, 2018.
  • [19] Michael Mitzenmacher and Eli Upfal. Probability and computing - randomized algorithms and probabilistic analysis. Cambridge University Press, 2005.
  • [20] Michael Mitzenmacher and Berthöld Vöcking. The asymptotics of selecting the shortest of two, improved. In Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, 1999.
  • [21] Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml predictions. In Advances in Neural Information Processing Systems, pages 9684–9693, 2018.
  • [22] Ziv Scully and Mor Harchol-Balter. SOAP bubbles: robust scheduling under adversarial noise. In Proceedings of the 56th Annual Allerton Conference on Communication, Control, and Computing, 2018.
  • [23] Ziv Scully and Mor Harchol-Balter and Allen Scheller-Wolf. SOAP : One Clean Analysis of All Age-Based Scheduling Policies. In Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2018.
  • [24] Nikita Dmitrievna Vvedenskaya, Roland L’vovich Dobrushin, and Fridrikh Izrailevich Karpelevich. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problemy Peredachi Informatsii, 32(1):20–34, 1996.