Performance of the smallest-variance-first rule in appointment sequencing

12/04/2018 ∙ by Madelon A. de Kemp, et al. ∙ 0

A classical problem in appointment scheduling, with applications in health care, concerns the determination of the patients' arrival times that minimize a cost function that is a weighted sum of mean waiting times and mean idle times. Part of this problem is the sequencing problem, which focuses on ordering the patients. We assess the performance of the smallest-variance-first (SVF) rule, which sequences patients in order of increasing variance of their service durations. While it was known that SVF is not always optimal, many papers have found that it performs well in practice and simulation. We give theoretical justification for these observations by proving quantitative worst-case bounds on the ratio between the cost incurred by the SVF rule and the minimum attainable cost, in a number of settings. We also show that under quite general conditions, this ratio approaches 1 as the number of patients grows large, showing that the SVF rule is asymptotically optimal. While this viewpoint in terms of approximation ratio is a standard approach in many algorithmic settings, our results appear to be the first of this type in the appointment scheduling literature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Setting up appointment schedules plays an important role in health care and various other domains. The main challenge lies in efficiently running the system, but at the same time providing the customers an acceptable level of service. The service level can be expressed in terms of the waiting times the customers are facing, and the system efficiency in terms of the service provider’s idle time. The problem of generating an optimal schedule is generally formulated as minimizing a cost function (or simply “cost”) that is a weighted average of the expected idle time and the expected waiting times. As most literature on this topic focuses on applications in health care, we refer throughout this paper to customers as patients, and to the server as the doctor.

The problem of scheduling appointments can be split into two parts: one needs to determine the amount of time scheduled for each appointment, and one needs to determine in which order the patients should arrive. These problems are usually referred to as the scheduling problem and sequencing

problem, respectively. This paper will focus on the sequencing problem (and later, the combined sequencing and scheduling problem), in a context with a single doctor seeing a sequence of patients. We impose the common assumptions that the service times of the patients form a sequence of independent random variables, while they arrive punctually at the scheduled times (which we will refer to as

epochs

). In this setting, a variety of techniques is available that determines for a given order the optimal arrival epochs; see, e.g.,

[3, 25] and references therein. However, much less is known about the efficient computation of “good” sequences. Already for a relatively modest number of patients, the number of possible sequences is huge, thus seriously complicating the search for an optimal order. An appointment scheduling review paper from 2017 [2] states that the optimal sequencing problem is one of the main open problems in the area:

“[…] one of the biggest challenges for future research is to find optimal (or near-optimal) solutions to more realistic appointment sequencing problems.”

A number of papers consider the sequencing (or combined sequencing and scheduling) problem and develop various stochastic programming models for it [4, 10, 27, 29]. However, the resulting optimization problems are very difficult to solve. Variants of the problem have been shown to be NP-hard [24, 29], indicating that this difficulty is inherent.

A more popular approach has been to consider simple heuristics for the sequencing problem. The most frequently used heuristic is to order the patients by the variance of their service times, from smallest to largest. Throughout this paper we refer to this sequence as the

svf (smallest-variance-first) sequence. The intuition for using the svf sequence is that an unusually long service time in the beginning could cause many later patients to have to wait, and the svf sequence aims to reduce the risk of this occurring. This is a very simple and appealing rule, and only requires the evaluation of the variances of the service times. It has been observed by simulation that the svf rule typically performs very well, often even optimally. It was proven that it is optimal for two patients under some distributional assumptions [14, 37]. Recently, however, Kong et al. [24] provided instances showing it need not be optimal, even for simple cases with uniform or lognormal service times and a substantial number of patients.

Despite the svf sequence appearing promising in simulations, little is known about its theoretical performance, or of any other simple heuristic for that matter. In this paper, we propose a new direction of research for the sequencing problem: finding sequences that provably perform well. Instead of finding an optimal sequence, such research aims at finding performance bounds on easily-computed sequences. Considering previous research, the svf sequence is the obvious candidate for such an easily-computed and well-performing sequence, and will therefore be our focus. The precise quantity of interest to us will be the ratio of the cost of the schedule coming from the svf sequence, and the cost of the schedule coming from the optimal sequence.

Our main goal in this paper is to prove upper bounds on this ratio – known as the approximation ratio – in various settings. This direction of study is very standard in the algorithmic community when considering intractable (NP-hard) problems, for example in machine scheduling (see [16, 17, 35] and references therein). However, it has not been studied in the appointment sequencing context. Note that for typical problem instances the svf sequence could perform significantly better than suggested by an upper bound on the approximation ratio, as the bound must also hold for worst-case instances.

1.1. Main contributions

We first concentrate exclusively on the effect of the sequence, using the simplest choice of schedule: each patient is assigned a slot of length equal to its mean service time. In other words, the arrival time of any patient is set equal to the sum of the mean service times of all preceding patients. This is certainly not the optimal solution to the scheduling problem, but it has the advantage of being very simple and easily applicable, and also completely independent of the choice of tradeoff in the cost function between doctor idle time and patient waiting time. As was stated in, e.g., the survey paper [2] and in [12], this “mean-based” type of schedule is a commonly used approach in practice.

Under the mean-based scheduling rule, we prove a number of results. Under an assumption (namely that the service-time distributions are comparable according to a certain ordering), we prove in Section 3 that the approximation ratio of svf is at most 2 for symmetric service-time distributions, and at most 4 in general. In other words, we show that for all instances (i.e., for all numbers of patients and all service-time distributions satisfying the assumption imposed) the svf cost is at most four times the optimal cost. We also consider two special cases:

  • Service times are evidently nonnegative, but one could consider the situation that normal distributions are used as an approximation of the actual distributions of service times. In Section 

    3.2, we prove that then the approximation ratio is at most . While we do not believe that our result here is sharp, it indicates that the performance of svf for well-behaved service-time distributions is most likely substantially better than suggested by the bounds 2 and 4 mentioned above.

  • In Section 3.3 we bridge the gap between the upper bound of 2 for symmetric distributions and the general upper bound of 4, by developing a method that isolates the effect of asymmetry. For the lognormal distributions fitted to real data in Çayırlı et al. [7], this method results in an approximation ratio of at most 3.43.

In Section 4, we consider the combined sequencing and scheduling problem. Here, we wish to compare a heuristic for this combined problem to the overall optimal schedule, over all possible sequences and schedules. Observe that the simple mean-based scheduling rule may lead to high cost, because waiting times could easily propagate. We therefore consider a simple alternative scheduling rule, suggested by Charnetski [9]: the slot assigned to a patient is equal to its mean service time, plus some multiple

of the standard deviation of its service time (where this

is optimized). Again under some assumptions, we show that this scheduling rule, combined with the svf sequencing rule, yields a cost that is (relative to the optimal cost) off by at most a constant factor.

We also consider the special case of lognormally distributed service times, as these are often seen in practice [7, 22]. Using a slightly different scheduling heuristic (the interarrival time being a multiple of the mean service time), we find an upper bound on the approximation ratio. Applying this result to the data in Çayırlı et al. [7], we find an upper bound of 2.90 in the case that in the cost function the waiting and idle times are equally important.

In Section 5, we return to the mean-based setting of Section 3. We show that as the number of patients grows large, the approximation ratio tends to 1. This result requires only a very weak assumption on the service-time distributions. The important practical implication of this result is that svf is close to optimal in settings where the number of patients is substantial.

Finally, in Section 6, we give an example that demonstrate that the assumptions made in previous sections are necessary: without any restrictions on the service-time distributions, no bound on the approximation ratio of svf is possible. This holds true also when optimal rather than mean-based schedules are used. This example involves only two patients. Then we give a (still relatively straightforward) example that shows that it is impossible to obtain a better bound than 1.28 on the approximation ratio in the mean-based setting of Section 3 (i.e., the setting for which we found the upper bound of 4).

1.2. Further related work

Here we will mention some of the most relevant literature for this paper. For more extensive reviews on the appointment scheduling and sequencing literature, we refer the reader to, e.g., Ahmadi-Javid et al. [2], Çayırlı and Veral [6], and Gupta and Denton [15].

As already noted, Kong et al. [24] showed that svf is not in general optimal. In some very specific cases, optimality of svf has been demonstrated. For only two patients, the svf

sequence is optimal when the service times are both exponentially distributed or both uniformly distributed 

[37], or more generally, when the two service times are comparable according to a certain convex ordering [14]. For three patients, Kong et al. [24] find sufficient conditions for the svf sequence to be optimal, when the time scheduled for each appointment is equal to the mean service time. (We have verified that this result can be extended to four patients using the same methods.)

Kemper et al. [19] analyze a sequential optimization approach, meaning that the arrival time of a patient is optimized without taking into account its impact on later patients. They show that under this rather different notion of optimality, and if the service times come from the same scale-family, then svf does provide the best ordering.

One line of research focuses on comparing various sequencing heuristics (including svf) through simulation. Denton, Viapiano and Vogl [11] consider a model similar to ours, and discuss the effectiveness of a number of simple sequencing heuristics using simulation, based on real surgery data. The svf heuristic performed best of all the heuristics they considered. Mak, Rong and Zhang [27] consider a model where waiting time costs may be different for different patients; by studying some more tractable approximations, they also find that svf performs well.

Klassen and Rohleder [22] and Rohleder and Klassen [33] consider an appointment scheduling model where not all patient information is known in advance; rather, patients must be scheduled as they call in to make an appointment (and so without information about patients who call later). Once again, it was empirically found that it worked best to put patients with low-variance service times early in the schedule.

A number of works model variants of the combined sequencing and scheduling problem as stochastic integer or linear programming problems. Solving these programs is very challenging however, and generally exact results were only obtained for small instances. Works along these lines include Denton and Gupta 

[10], Mancilla and Storer [29] and Berg et al. [4]. For larger instances, it was necessary to resort to heuristics such as svf for the sequencing problem. We mention Vanden Bosch and Dietz [36] who propose instead a local search heuristic to iteratively improve the sequence by finding pairs of patients who can be swapped to improve the solution.

There are also a number of papers which take a robust optimization approach [23, 30, 28]

. Here, instead of working with explicitly given service-time distributions, the goal is to find a schedule minimizing the worst-case expected cost given only that the distributions meet certain constraints (such as certain given moments). Most relevant to us, Mak et al. 

[28] discuss one such robust model, and are able to prove that under mild assumptions svf

is optimal in this context. In their model, the joint distribution of the service times could be any distribution matching known moments for individual service times (e.g., the means and variances). However, the worst-case distributions corresponding to the optimal schedule are typically highly correlated; these results do not carry over to a model where independence is assumed. Mittal et al. 

[30] discuss another robust model, in which each service time can take any value in a certain interval. They find a -approximation algorithm for the combined scheduling and sequencing problem.

Finally, we would like to point out the relation with machine scheduling (see the book by Pinedo [31] for more background). The main difference between machine scheduling and appointment scheduling is that in the former the arrival times of jobs/patients are given, while in the latter these are decision variables. The machine scheduling problem most closely related to our problem can be found in Guda et al. [13]. In this paper, the due dates and sequence of jobs need to be minimized, in order to minimize a weighted average of expected earliness and tardiness around the due dates. The svf rule is optimal in the model of Guda et al., under some assumption on the service times of jobs. However, in their model all jobs are present from the start and so that there is no idle time. Compared to our model, this greatly simplifies the expression for the cost function, which facilitates finding an optimal solution.

2. Model and preliminaries

Consider a problem instance with patients, numbered 1 up to . We denote the service time of patient in this problem instance by , which has mean and variance . As pointed out in the introduction, one should distinguish between the scheduling problem and the sequencing problem. The sequencing problem, on which we primarily focus, is to decide which patient is assigned which appointment slot. The sequence is denoted by a permutation (where denotes the set of all permutations on ). The value will denote the index of the patient that is assigned to appointment slot . The scheduling problem is to decide the interarrival times between patients, given the sequence in which they arrive. We use to denote the interarrival time between patient and the next patient, i.e., the length of the appointment slot reserved for patient

. The vector

will be referred to as the schedule.

Let denote the waiting time of the patient in appointment slot . Let be the idle time before the start of appointment slot after the previous patient has been served. Given a sequence and interarrival times , the waiting times and idle times can be computed using the Lindley recursions [26], which read

(1)

using the notation and .

We use a parameter to indicate the relative importance of idle time and waiting time. As a cost function, we seek to minimize

(2)

a weighted average of the expected total idle time and expected total waiting time. Observe that this cost function still depends on the sequence , on the schedule , and on the patient service-time distributions . We generally suppress the dependence on , but we may write

if we wish to be explicit. As an aside, we mention that an approach to estimate

in a practical context can be found in [32].

Throughout this paper, we assume the patients are indexed such that . The svf sequence is then the sequence given by the identity permutation id given by . The waiting times and idle times under this sequence are denoted as and respectively. We compare this sequence with the sequence that minimizes (2). The waiting times and idle times under this optimal sequence are denoted by and respectively.

To compare these sequences, we study the ratio between the cost functions under the svf sequence and the optimal sequence. If this ratio is small, then this is evidence that the svf sequence performs well. We do so in two settings. In one setting, the schedule is restricted to be the mean-based schedule given by . We then consider the approximation ratio

We will write just when the service-time distributions under consideration are unambiguous.

In the second setting, we compare the svf sequence along with a given schedule with the optimal combination of sequence and schedule. We then use the notation and for the combination of the svf sequence and the given schedule , and we use the notation and for the combination of sequence and schedule that minimizes (2). We then consider the approximation ratio

Once again, we will omit and when their choice is unambiguous.

In this paper, we prove, under some assumptions, upper bounds on and . Such an upper bound then guarantees that the svf sequence always has a cost function of at most such an upper bound times the optimal cost function. We also show, under some condition, that converges to 1 as the number of patients tends to infinity, thus proving that the svf sequence is asymptotically optimal when mean-based schedules are used.

Remark 2.1.

Service times are inherently nonnegative, but our framework (based on the Lindley recursions (1)) carries over to situations where the

are allowed to take negative values. This might be useful if the true distributions of service times can be approximated using distributions that can take negative values (with some small probability), for example normal distributions. If such distributions that can take negative values form a good fit to the data in some application, the theoretical performance of the

svf rule for these distributions gives some indication for the performance of the svf rule in this application.

2.1. Preliminaries

We need the following well-known results concerning the waiting and idle times. It follows by iterating the Lindley recursion (1) that is the maximum of a random walk with steps , that is,

(3)

In the setting of mean-based schedules , we introduce the notation , and the random walk

We then find for the mean-based schedule that

(4)

Computing the total time until all patients have been served in two separate ways, we find the identity

(5)

For a given schedule, this relation can be used to express the expected total idle time in the expected waiting time of the last patient. Therefore, we can focus on the waiting times, and derive results for the idle time from (5).

We also need the concept of a convex ordering on random variables. More information on the convex ordering and related concepts can be found in Shaked and Shanthikumar [34].

Definition 2.2.

The random variable is said to be smaller in the convex order than the random variable if for all convex functions for which the expectations exist. This will be denoted by . If , then is said to be smaller than in the dilation order, denoted as .

Note that implies , and implies . The following lemma [34] is useful when checking whether given random variables satisfy a convex order.

Lemma 2.3.

The random variables and satisfy if and only if there exists a coupling and such that .

3. Bounds on performance under mean-based schedules

In this section, we provide bounds for , the approximation ratio under the mean-based schedule given by . This amounts to giving an upper bound on the cost function when using the svf sequence, and a lower bound on the cost function that is valid for any sequence, hence also for the optimal sequence.

This section is structured as follows. In Section 3.1 we prove the main results: Theorem 3.3 and Theorem 3.4. These theorems give bounds on the approximation ratio , when we assume that the service times are symmetrically distributed and follow a dilation order (Theorem 3.3), and when we only assume that they follow a dilation order (Theorem 3.4). In Section 3.2 we consider the special case of normally distributed service times. Theorem 3.8 gives an improved bound on in this case. In Section 3.3 we discuss a method for improving numerically upon the bound of Theorem 3.4; informally, the more symmetric the service-time distributions, the closer the resulting bound is to the value stated in Theorem 3.3.

3.1. Main results

We impose the following assumption.

Assumption 3.1 (ordering).

We have .

We remark that this is the condition under which Gupta [14] proves optimality of svf for two patients. Note also that this assumption implies . Examples of instances satisfying this assumption include all having exponential distributions (by Theorem 3.A.18 in [34]), and all having lognormal distributions such that both and , as proved in Appendix D. In Section 6 it will be shown that this assumption is necessary.

For one of the bounds we prove on we also make the following assumption.

Assumption 3.2 (symmetry).

The have symmetric distributions around their mean.

Examples of instances satisfying both the ordering and symmetry assumption include all having normal distributions, all having uniform distributions and all having Laplace distributions. For all three examples, the ordering assumption follows from Theorem 3.A.18 in [34].

In this section, we prove the following theorems.

Theorem 3.3.

Under the ordering and symmetry assumptions, we have .

Theorem 3.4.

Under the ordering assumption, we have .

A first key point is that to prove these theorems, it suffices to prove bounds on for given when the first slots are constrained to contain patients . This is made explicit in the next lemma, proved in Appendix A.

Lemma 3.5.

Let denote the expected waiting time of the patient in appointment slot , under the sequence that minimizes this expected waiting time, subject to the constraint that for all , i.e. the first patients are assigned to the first slots. Suppose for all . Then, under the ordering assumption, .

The following lemma is another key ingredient.

Lemma 3.6.

Under the symmetry assumption, the random variable is stochastically dominated by , and thus

Proof.

Recall that we have from (4). Under the symmetry assumption, the steps of the random walk have a symmetric distribution around zero, and hence the same is true for the .

Let , and note that . To bound this probability, we look at the random walk reflected in after . This reflected process is defined by

(6)

We have , so . As the have symmetric distributions, the increments of and for have the same distribution. Therefore, we see that is stochastically dominated by , for every . We conclude that for all .

Now note that implies that either or . As these are disjoint events we now have

This holds for any , so is stochastically dominated by , as was claimed. ∎

Proof of Theorem 3.3.

As , we have

Note that for all when we consider , so now

(7)

On the other hand, by Lemma 3.6,

As is now bounded by 2, Theorem 3.3 follows from Lemma 3.5. ∎

Proof of Theorem 3.4.

Note that Lemma 3.5 and the lower bound (7) are valid without the symmetry assumption being needed. We therefore only need an upper bound on .

Let have the same distributions as respectively such that all these random variables are independent. Let be the maximum of the random walk with steps . As

we see using Lemma 2.3 that . Note that is a convex function in , as it is the maximum of functions linear in . Therefore, each time we replace a step with a step the expected maximum of the random walk will increase, so .

Now note that the steps all have a symmetric distribution, so we can apply Lemma 3.6 to find

As is now bounded by 4, the result follows from Lemma 3.5. ∎

Remark 3.7.

In case the scheduled session end time equals the expected total service time, the overtime reads

which can also be included in the cost function. As such, overtime is handled similarly to waiting time, and consequently the results of Theorems 3.3 and 3.4 remain valid when some extra term with is added to the cost function.

3.2. Normally distributed service times

The results of Theorems 3.3 and 3.4 can be strengthened for specific service-time distributions. One such result is the following.

Theorem 3.8.

When the are all normally distributed we have .

In order to prove Theorem 3.8, we need the following two lemmas, giving stronger bounds on and . The proofs of these lemmas, that hold for any symmetrically distributed service times, can be found in Appendix A.

Lemma 3.9.

Under the symmetry assumption,

Lemma 3.10.

Under the symmetry assumption, for any ,

Proof of Theorem 3.8.

Note that normal distributions satisfy both the ordering and symmetry assumption. Now the sum again has a normal distribution, with mean zero and variance . For the svf sequence we now have, using Lemma 3.9, that

(8)

Now we still need an expression for a lower bound on . Let be the variance of . From Lemma 3.10 it then follows that

Recall that was the optimal expected waiting time when whenever . Therefore, we have and . Now note that

is largest when is as close to as possible. As is largest of the with , we can always choose such that

This choice of provides us with the lower bound

valid for any sequence. Comparing with (8), we obtain

As , this fraction only depends on the relative size of compared to . Suppose that , for some . Then , and the fraction becomes

It can easily be seen that is increasing, and that as .

We now know that . By Lemma 3.5 the same is then also true for the cost function. This proves Theorem 3.8. ∎

3.3. Numerically improving the bound of Theorem 3.4

Under the ordering assumption, we have proved that , and we also proved that when the service times have symmetric distributions. This suggests that an upper bound on can be found between 2 and 4 for service time distributions that have some degree of symmetry, but are not completely symmetric.

Here we introduce a method to split the service time distributions into a symmetric and a nonsymmetric part, thus isolating the effect of the asymmetry on the upper bound. This can be used to numerically compute an upper bound on for given problem instances. We do so for lognormal service time distributions that fit real data in Çayırlı et al. [7]. We still impose the ordering assumption.

We introduce the method for continuously distributed service times to simplify the exposition, noting that extending the method to non-continuous distributions is straightforward. Suppose has density . We set , and . Then we let be a random variable with density . We let be a random variable, independent of , with density . Let be a Bernoulli variable taking the value one with probability , independent of and . We thus have

Now has a symmetric distribution around zero, so corresponds to the symmetric part of , and to the nonsymmetric part. Note that and , so we must have . Let have the same distribution as , so that is independent of all the other random variables. Since , we have

By Lemma 2.3 we conclude

As is a convex function in each of the , we can then replace each by this upper bound in convex order to get an upper bound on . Using Lemma 3.6, we then find the following upper bound:

The upper bound in this proposition can now be compared numerically to the lower bound , for each , valid under the ordering assumption. Combining the above with Lemma 3.5, this leads to the bound given in the next theorem.

Theorem 3.11.

Under the ordering assumption, we have

(9)

The more symmetric the service times and thus the random variables , the smaller the and hence also the upper bound in (9). When the service times are completely symmetric, the asymmetric parts will be zero, and we recover the upper bound of 2 of Theorem 3.3.

Note that the upper bound in Theorem 3.11 is much easier to numerically compute or simulate than itself, as for the latter one needs to go over all possible sequences to find the optimal one. Also, this method can be used to find an upper bound on for any problem instance where the service times come from a finite set of distributions and an upper bound on is given, as illustrated in the next example.

Group           Mean Standard deviation

 

Return
New
Table 1. . Parameters of the lognormal distributions fitted by Çayırlı et al. [7].

In Çayırlı et al. [7] patients were divided in two groups: new and return patients. For both groups, lognormal distributions were found as a good fit to the data used in the paper, with parameters as shown in Table 3.3. We checked that problem instances coming from these two distributions satisfy both and , and so satisfy the ordering assumption. It was also mentioned that the doctor that provided the data sees 10 patients per session.

We now consider 11 problem instances: each problem instance consists of 10 patients, with 0 up to 10 of them being a new patient. Computing the upper bound in (9) for each of these instances through simulation, we find that for any problem instance consisting of at most 10 patients, with service times that follow one of the two lognormal distributions. Thus, for any problem instance the doctor in [7] might face.

4. Bounds on performance for optimal interarrival times

In the previous section we assumed mean-based schedules. We relax this assumption here, in that we consider the performance of the svf sequence compared to the optimal combination of sequence and schedule. We will again drop the extra subscript in the notation that we introduced in Section 2.

The goal of this section is to prove bounds on the approximation ratio . In Section 4.1, we will do so when the service-time distributions are from the same location-scale family, leading to Theorem 4.2. Then, in Section 4.2, we give some examples in which the upper bound of Theorem 4.2 can be explicitly computed. In Section 4.3, we consider the special case of lognormally distributed service times, that are often seen in practice [7, 22]. These lognormal distributions do not come from one location-scale family. A bound on the approximation ratio is then given in Theorem 4.5.

4.1. Location-scale family of service times

We impose the following assumption.

Assumption 4.1.

The are from the same location-scale family. In other words, there exists a random variable having mean zero and variance one such that .

Note that this assumption implies Assumption 3.1 (by Theorem 3.A.18 in [34]). In Section 6 it is shown that, without any assumption on the service time distributions, no bound on the approximation ratio can be found.

To obtain an upper bound on the cost function under the svf sequence, we also need to specify the schedule we are using. For the upper bound under Assumption 4.1, we use a schedule of the form for some . This means that we plan an amount of time for each appointment equal to the expected time the appointment will take, plus an extra amount of time proportional to its standard deviation, so as to be able to absorb delays. The will be set to

(10)

in order to minimize the upper bound. Let

denote the quantile function of

, i.e. . Define , and

The main result of this section is the following.

Theorem 4.2.

Suppose that, for the svf sequence, we use the schedule , with given by . Under Assumption 4.1, we have .

This result follows immediately from the bounds on the cost function given in the following two propositions, that are proved in Appendix B.

Proposition 4.3.

Suppose is given by . Under Assumption 4.1,

Proposition 4.4.

Under Assumption 4.1, for any sequence and schedule,

The idea behind proving Proposition 4.3 is as follows. We use that the waiting time can be expressed as the maximum of a random walk, as per equation (3). An upper bound for this maximum can now be found by comparing to another random walk that has i.i.d. steps, each distributed as the step of the original random walk with the largest variance. This upper bound is found by noting that if (i) one splits the steps in two parts, (ii) multiplies the last part by some constant larger than one (leaving the first part unchanged), then the maximum increases. For the maximum of the new i.i.d. random walk, the classical Kingman’s bound can be applied. After thus finding an upper bound on the expected waiting time, the expected idle time can then also be bounded using (5).

The idea behind the proof of Proposition 4.4 is to write

and minimize this over . This minimization problem is the classical newsvendor problem, which has a known solution. This results in a lower bound on the cost function that is independent of schedule. This lower bound can also be easily minimized over the sequences, resulting in Proposition 4.4.

When , i.e. when waiting time and idle time are equally important, we know that is equal to the median of . In this case, we have , where is the median of .

4.2. Examples

Now we present examples of location-scale families for which we can compute or from Theorem 4.2. This way we can obtain some insight into the magnitude of the constant . For the location-scale families of normal, uniform, shifted exponential and Laplace distributions, the results are shown in Table 4.2. For normal distributions is not shown, as the expression does not simplify (with respect to the one presented in Theorem 4.2).

Location-scale family

 

Normal -
Uniform
Shifted exponential
Laplace
Table 2. . The values of and for some location-scale families.

Now consider the case of Pareto (of type II, that is) distributions. A random variable has such a distribution if

for certain parameters . The Pareto distributions with fixed parameter form a location-scale family. Suppose that the have Pareto distributions with fixed parameter . Then

In addition,

For most typical location-scale families, the value of is between 2 and 3. However, in the Pareto case the value becomes much larger when approaches two. Also, for close to either one of the extremes 0 or 1, the constant blows up.

4.3. Lognormally distributed service times

In this subsection we use the notation and . We have the following result.

Theorem 4.5.

Suppose the are lognormally distributed with and . When we use the schedule for the svf sequence, with

then

The proof of this theorem can be found in Appendix B. The ideas behind the proof are similar to those of Theorem 4.2. The main difference is in the upper bound, where it needs to be proved that the i.i.d. random walk used for comparison indeed has a bigger expected maximum. For lognormal distributions, we use a convex ordering among the stepsize distributions to prove this, noting that the maximum of a random walk is a convex function in the stepsizes.

As an example, we apply Theorem 4.5 to the data found in Çayırlı et al. [7]. Recall that the patients were divided into “new” and “return” patients, with service times fitted by lognormal distributions with parameters given in Table 3.3. It can be checked that any problem instance containing a mix of these patient groups satisfies the assumptions of Theorem 4.5. For any such problem instance, the largest possible corresponds to a new patient, and the smallest possible corresponds to a return patient. When setting , calculating the upper bound in Theorem 4.5, we find for the doctor studied in Çayırlı et al. that .

5. Asymptotic optimality of svf

In this section we assess the performance of the svf sequence as the number of patients grows large. Throughout this section we assume that the schedule is mean-based: the time planned for each appointment is equal to the corresponding mean service time. The goal in this section is to prove that the svf sequence is asymptotically optimal as the number of patients tends to infinity, under certain conditions.

5.1. Main result

We consider the setting in which we are given, for each value of , a vector of service-time distributions. For , let and denote the mean and variance of , and let for all . Similarly, , , and are all with respect to the service-time distributions .

We require the following assumption, similar to the Lyapunov condition of the Lyapunov-version of the central limit theorem (CLT). The difference between our assumption and the conventional Lyapunov condition is the supremum over all

and all sequences .

Assumption 5.1.

We assume that there exists a such that, as ,

In Section 6 it is shown that it is necessary to make this assumption. The main result of this section is the following.

Theorem 5.2.

Under Assumption 5.1, as .

To prove Theorem 5.2, we derive an upper and a lower bound on the expected waiting time, which we then combine. We again view the waiting time as the maximum of the random walk . For both bounds we use the reflected process in level , that is also defined in (6), to obtain bounds on the distribution of the waiting time. For the upper bound we can ignore the difference between and right after crossing level for the first time, resulting in the process . The processes , and are illustrated in Figure 5.1. For the lower bound we truncate all steps at some value . The difference between and is then bounded by . We then choose small enough that this difference becomes negligible in the limit, but also big enough that the difference between the original random walk and the random walk with truncated steps becomes negligible as well. Using a Berry-Esseen bound for martingales established in [18], we can then estimate the distributions of and , and thus the distribution of , for large .

Figure 1. . The process in black, in blue and in green. The red line indicates level .

Define . To prove Theorem 5.2, we need the following two propositions, that are proved in Section 5.2 and Section 5.3 respectively. These propositions are then combined in Proposition 5.5, after which we can establish Theorem 5.2.

Proposition 5.3.

For any and we have

where is a standard normal random variable, and is a constant that only depends on .

Proposition 5.4.

Under Assumption 5.1, for each there exists a depending on only, such that for all and all ,

Proposition 5.5.

Under Assumption 5.1, for any there exists a depending on only, such that for all and for all ,

Proof.

By Proposition 5.4, for any , we can choose sufficiently large such that

for any sequence , in particular for the optimal sequence. Here we also used that are the smallest variances. For the svf sequence, Proposition 5.3 gives us that for sufficiently large we have

Combining these two bounds completes the proof. ∎

Proof of Theorem 5.2.

By the Lindley recursion we have

so, for any and , is increasing in . With as in Proposition 5.5, we consequently have