Services are by definition intangible products that are experienced by the customers. Services may be defined in different contexts such as telecommunication, transportation, healthcare, banking, etc. Because of the intangible nature of services, quality of service mainly depends on the customers’ experience of the received service. One of the important measures of the quality of service is the delay experienced by the customers. Any service system has a limited service capacity and therefore is not capable of providing service to all customers at the same time, during high-demand periods. As a result, customers usually have to wait in queues in order to receive service. In other words, servers are analogous to shared resources among customers, which cannot be utilized by all the customers at the same time. In addition to the service capacity limitations of the system, time-variability and randomness of the demand are other important factors which can lead to queue formation.
Waiting time prediction for the new customers can be beneficial from both the customers’ and the service providers’ points of view. Service providers can use delay predictions for better management of the system by adaptive matching of the service capacity to the demand. On the other hand, waiting time predictions can be used for delay announcements to the customers, which will result in lower uncertainty about the waiting times, and therefore, higher customer satisfaction .
Waiting time prediction in service systems can be classified into two categories: queueing-theoretic methods and data-based methods. Let us first begin with the queueing-theoretic methods.
One of the earliest work on predicting customer’s waiting time in a multi-server queue is . This paper investigates the possibility of improving delay predictions by exploiting information about the system state, and the elapsed service time of the customers in service, under non-exponential service time assumptions. Following up on , Ibrahim and Whitt have studied the performance of alternative queue-length-based and delay-history-based predictors. Three types of delay history information which have been used widely in these papers are the delay of the last customer to enter service (LES), the elapsed waiting time of the customer at the head of the line (HOL) and the delay of the last customer to complete service (LCS). The real-time performance of delay-history-based predictors such as LES and HOL predictors are studied for the standard queueing model in . In , Ibrahim and Whitt extend their analysis to queueing models with abandonments. Particularly, they study the overloaded model, where denotes i.i.d abandonment times with general distributions. It is shown that the queue-length predictor performs poorly in this regime, while HOL remains an effective estimator. The performance of the delay-history based predictors in multi-server queueing systems with non-stationary arrivals is studied in . It is shown that the delay-history based predictors can have significant estimation bias, particularly if the system goes through alternating overload and underload periods. Moreover, a refined delay estimator based on HOL is introduced in  to cope with time-varying arrivals in the model, where represents nonhomogeneos Poisson arrivals. Finally, time-varying capacity is taken into account in , where four new predictors have been introduced for the queueing model.
Limitations of the queueing-theoretic analysis have resulted in recent interest in data-based methods such as machine-learning algorithms and data-mining techniques. These methods have been used for waiting time prediction in different contexts and more realistic settings such as healthcare and transportation systems. For instance, machine learning techniques have been used in to predict flight delays by exploiting available information from sensors in the airport. Combining process mining and queueing-theoretic results,  introduces queue-mining techniques for predicting waiting times in service systems. Ang et al.  propose a new predictor, called Q-Lasso, which combines the Lasso method from statistical learning and fluid models from queueing theory. The authors consider waiting time prediction in emergency departments and use datasets from four hospitals.
Both of these methods have their own advantages and shortcomings. One of the main disadvantages of the queueing-theoretic methods is that the analysis can easily become intractable by considering more realistic assumptions or including more information sources in the prediction process. Particularly, consider delay-history-based prediction, which is the focus of this paper. The existing queueing-theoretic methods based on delay history information are often limited to exponential service time assumptions and stationary arrival times. Moreover, the refined delay-history-based predictors that have been introduced for time-varying arrival settings, such as the ones proposed in 
, require knowledge of the model parameters including arrival rate, service rate and the number of servers, which might not be available and need to be estimated as well. Another shortcoming of the existing queueing-theoretic methods is their limitation in exploiting all the available information, such as delay history of multiple recent customers, instead of only focusing on a single customer’s delay history (e.g., LES or LCS). On the other hand, the prediction method and the feature selection process in data-based predictions are usually specialized for a particular application and do not provide much insights about the importance of the features. Furthermore, uncertainty of the estimations and the distribution of the waiting times are other important pieces which are often missing in the literature. All these reasons motivated us to use statistical learning methods to study queueing models under more realistic assumptions, such as time-varying arrivals and non-exponential service times.
The remainder of this paper is organized as follows. In Section II, we describe the queueing system model and formulate the problems that we are going to study in this paper. We discuss some theoretical results on the conditional distribution of the delay given HOL information in Section III. In Section IV, we briefly review mixture density networks and discuss how it can be used to predict the conditional distribution of the delay. The evaluation of the proposed predictors are presented in Section V. Finally, Section VI presents the conclusions and the future work.
Ii System Model and Problem Setting
Consider a multi-server queueing system with infinite queue size and homogeneous servers with FCFS service discipline. We do not assume a specific distribution for service times or inter-arrival times and therefore, the arrival and service processes can have non-stationary distributions. Let denote the observed waiting time (before entering service) of the ’th last customer who entered service. Based on this definition, represents the LES delay, i.e., . Furthermore, the random waiting time of a new arrival conditional on observed delay history of the last customers who entered service, i.e. , is represented by .
Our goal is to predict the distribution of a new arrival’s waiting time, given that the customer has to wait and an observed delay history of , i.e., we are aiming to predict the conditional distribution . Furthermore, we are interested in predicting a single-value prediction for , which will be denoted by in the rest of the paper. In the case where the delay predictor directly uses delay of the last customer to enter service as its prediction, i.e., , we get the LES estimator. However, we are primarily concerned with the MMSE predictions, which can be obtained as . In other words, minimizes the mean squared error (MSE) of the predictor, which is defined as
It should be mentioned that it is difficult to determine MMSE predictor theoretically, even for the simple case of , and therefore, we use statistical learning methods to estimate the conditional mean of .
Our approach for estimating the conditional distribution of the waiting time is to use mixture density networks (MDNs) . In particular, the MDN uses a mixture of Gaussians to estimate the conditional distribution of the waiting time as follows
where , and denote the the mixing coefficient, mean and variance of the kernel, respectively, given delay history information . We will discuss this method in more detail in Section IV.
As we mentioned earlier, an important reason for estimating distribution of the delay is to obtain probabilistic bounds instead of just making predictions. More specifically, we can define a probabilistic lower-bound () and upper-bound () as follows:
are the violation probabilities for the upper-bound and the lower-bound, respectively. Since probabilistic upper bound provides a pessimistic estimation of the waiting time, it can be a good candidate for delay announcements to the customers. Finally, confidence interval is one of the other statistics that will be used in this paper to measure the amount of uncertainty for each prediction. Since the confidence intervals will be used along with the MMSE predictions, we define the confidence interval for random waiting timewith confidence level , as an interval with endpoints such that:
Iii Normal Approximation
In this section, we discuss some theoretical results on the conditional distribution of a new arrival’s waiting time given the HOL delay. Here, we focus on the HOL delay information since it results in the same performance as using the LES delay, while it makes the analysis easier . In particular, we consider the distribution of a customer’s waiting time conditional on the HOL delay in the queueing system, where represents the nonhomogeneous Poisson arrival process.
First, let us represent the delay of a new customer arriving at time , given a HOL delay of , by , which can be written as
where denotes the number of arrivals in the interval , and represents the time interval between successive service completions. Since the service times are i.i.d, the mean and the variance of can be obtained as follows
For the proof please see 
. Considering that the arrival process is modeled as a NHPP, the mean and the variance of the random variablewill be equal to
. Furthermore, since the conditional distribution of the delay given the past delay converges to the normal distribution when, the conditional mean and the variance often suffice to characterize the distribution of . As an example, consider a system with nonhomogeneous Poisson arrivals and sinusoidal arrival rate as described in section V-B. Fig. 1 shows a sample path of the ground-truth delay along with the MMSE predictions from Eq. (7). The filled region around the MMSE predictions show the region where the ground-truth delays are expected to appear with a confidence level of , i.e., . As can be observed, the uncertainty of the delay prediction increases as the delay becomes larger.
Ibrahim and Whitt  have proposed a refined HOL-based delay estimator which uses the conditional mean calculated from Eq. (7) as its prediction. Although the refined HOL estimator performs better than the classical HOL estimator, especially in systems with time-varying arrivals, it requires additional information about the system parameters, as well as the arrival and service processes, which might not be available in practice and need to be estimated too. As a result, we are interested in estimators that only use the delay history information. However, instead of relying on the delay information of a single customer (HOL or LES delay), we use the delay-history of a larger number of past customers. Moreover, we use this information to estimate the conditional distribution of the delay using mixture density networks.
Iv Mixture Density Networks (MDNs)
The mixture density network provides a general framework for approximating arbitrary conditional distributions. Using a mixture model as an approximation of the true conditional distribution, an MDN estimates the parameters of the mixture model using a fully-connected neural network. More specifically, the MDN approximates the conditional distribution ofgiven by
where are the mixing coefficients and, and denote the mean and variance of the ’th kernel, , given . Since the conditional distribution of the delay converges to the normal distribution under heavy-load assumptions, Gaussian kernels are reasonable candidates for the mixture model components.
The output layer of the neural network consists of three types of nodes which predict the parameters of the mixture model in Eq. (9
). The first type uses the soft-max activation function to predict the mixing coefficients such thatand . The second group, which predict the variances of the kernels, use exponential activations to ensure non-negative values. The last group of the nodes use linear activations and compute the means of the kernels. Using a data set of customers, where each sample contains features (delays of the last customers to enter service) and a label (delay of that particular customer), , the mixture density network learns the weights of the neural network by minimizing the error function, which is defined as the negative logarithm of the likelihood, i.e.,
One of the main challenges in implementing a mixture density network is its instability issue. In order to address this problem, various techniques and modifications have been proposed. An important problem associated with mixtures of Gaussians is the presence of singularities in the Likelihood function. In other words, in the case of having more than one Gaussian component, , the maximization of the Likelihood function is not a well-posed problem since the Likelihood can easily go to infinity whenever one of the Gaussian components lies on a specific data point and its variance goes to zero. For a more detailed discussion of the implementation issues related to mixture density networks refer to .
V Evaluation and Results
V-a Performance Measures
In order to evaluate the performance of the delay predictors, we consider two measures: absolute bias and mean squared error (MSE) for accuracy and precision, respectively. The absolute bias is defined as and will be approximated by
where and are the ground-truth and predicted waiting times for the data point.
The other measure is defined as and will be approximated by the average squared error as follows:
V-B Simulation Experiments
In the rest of this section, we present our results on delay prediction and distribution estimation in multi-server queueing systems. Let us begin with a short description of the arrival and service processes that are used in the following experiments. First, we consider a deterministic ON-OFF arrival process which can be used for simple approximation of a system with batch arrivals, such as transport terminals, in which arrival and departures occur in batches based on schedules. The ON-OFF arrivals have a cycle length of hours and a duty cycle equal to . It should be noted that time is normalized to the mean service time in this paper. The second type of time-varying arrivals that are used in the experiments is the nonhomogeneous Poisson process, which is a good fit for arrivals in hospitals and call centers . We adopt the same model as in  with sinusoidal arrival rate to capture the daily cycles, i.e., we consider an arrival rate of , where , and represent the average arrival rate, relative amplitude and the cycle length of the arrival rate. We have evaluated systems with exponential, lognormal and H2 service times (hyperexponential with coefficient of variation equal to 2), however, the results are only presented for lognormal service times, due to space considerations. Moreover, we use an MDN implementation which is based on samples and the test results are evaluated on sample customers. As mentioned earlier, since we are studying delay-history-based predictors, we only consider delay of the last customers to enter service as the feature set. The simulation parameters are summarized in Table I.
|Mean service time||1 (10 mins)|
|Cycle of NHPP arrivals||144 (1 day)|
|Relative amplitude of||0.5|
|Number of servers||20|
In the first experiment, we consider a multi-server queueing system with servers, deterministic ON-OFF arrival and lognormal service times. We use a simple feed-forward neural network to approximate the conditional mean of the delay given previous history, i.e., , and use it as our predictor. Fig. 2 shows the scatter plots of the ground-truth and predicted delays for the LES predictor, and three other NN-based predictors with delay history lengths of and . As can be seen in Fig. 2.a, there exists a time lag between the LES delays and the ground-truth delays, particularly when the system is busy and the ground-truth values are large. On the other hand, we can observe that even a simple feed-forward neural network which only relies on the LES delay can achieve a much better performance and reduce the MSE by around compared to the LES predictor (see Fig. 2.b). Furthermore, the absolute bias has been reduced tremendously, which suggests that the systematic time lag does not exist anymore. It seems that increasing the delay history information does not have a noticeable impact on MSE in this experiment and might even result in overfitting and therefore a larger MSE (Fig. 2.d). From Fig. 2
we can observe that the predictors can have large errors when the ground-truth delay is very small. One explanation might be that large past delays can either imply large delays for the next customers if the ON period continues, or suggest very small delays if the arrival process goes to the OFF state. The vertical group of the outliers around zero ground-truth delay represent this phenomenon.
Now, consider the same queueing system but with NHPP arrivals. Fig. 3 shows the scatter plots for the same set of predictors. Similar to Fig. 2, there is a time lag between the LES delays and the ground-truth values. Moreover, Fig. 3.a suggests that the conditional distribution of the delay given LES delay information should be multimodal. In other words, a particular LES delay can suggest both larger or smaller future delays, depending on whether the arrival rate and therefore system backlog, is in the increasing or decreasing phase. We observe that by increasing the length of the delay history to , the NN-based predictor can better learn whether the system backlog is in the increasing or decreasing phase and hence, it is able to reduce the MSE by around .
As we mentioned earlier, single value predictions of the waiting times might not be very informative and we are interested in more descriptive statistics of the system. For instance, it is more desirable to obtain stochastic upper bounds or lower bounds on a customer’s delay similar to Eqs. (3) and (4), rather than providing a single value prediction. In order to achieve this goal, we use MDNs as described in Section IV to estimate the conditional distribution of the waiting time given delay history of the last customers to enter service. Fig. 4 shows the predicted conditional distribution of the waiting time in the previous experiment with NHPP arrivals, given LES delays equal to , and , i.e., , and , respectively. The two modes in the estimated distribution function, correspond to the two groups of points in the scatter plot shown in Fig. 3.a, with the same LES delay value. Moreover, Fig. 4 suggests that given a particular LES delay, it is more likely to have a larger delay for the new arrival. This can be explained by the fact that most of the customers with the same LES delay should have entered the system in the increasing phase of the arrival rate, while the queue length is growing, and therefore they are more likely to experience larger delays than the LES customer.
Now, we use the estimated conditional distributions in a more effective way to obtain probabilistic bounds on the waiting time of a new arrival. Fig. 5 shows a sample path of the waiting times in the previous experiment with NHPP arrivals for a period of two days. The stochastic upper bounds and lower bounds calculated from Eqs. (3) and (4) are also shown in Fig. 5. The stochastic bounds are calculated to hold with a probability more than , i.e, . It should be mentioned that decreasing the violation probabilities can result in looser bounds. As mentioned earlier, the calculated upper bounds are good candidates for delay announcements. Closely related to these bounds, we can find the confidence intervals for each prediction using Eq. (5) and the predicted distributions. Fig. 6 shows the same sample path of the waiting times as in Fig. 5, along with the MMSE predictions and confidence intervals, obtained from the MDN. It can be observed that the confidence intervals can be pretty large for longer waiting times, which shows the importance of considering other statistics instead of focusing on single value predictions.
In this paper, we attempted to show the potential of the statistical learning methods in providing insights on service systems under realistic assumptions, such as time-varying arrivals and non-exponential service times. In particular, we studied the problem of delay prediction and distribution estimation in multi-server queueing systems. We showed that even a very simple NN-based predictor that only uses the delay history information of the previous customers can outperform the traditional LES predictor (decreasing MSE by ), without requiring any other information about the system parameters. More importantly, MDNs enable us to estimate the conditional distribution of the waiting time, which can be used to obtain much more informative statistics, such as probabilistic bounds, compared to the common single-value predictions.
Although the proposed NN-based methods are able to make good estimations for service systems under fixed model assumptions, we are interested in studying more realistic and complex cases, where model parameters such as the number of servers and service time distributions can change over time. As discussed in the previous sections, these systems can appear in many different contexts, such as call centers and emergency departments, where theoretical analysis of the system under realistic assumptions gets intractable.
-  Ibrahim, R.: Sharing delay information in service systems: a literature survey. Queueing Syst. (2018)
-  Whitt, W.: Predicting queueing delays. Manag. Sci. 45(6), 870–888 (1999b)
-  Ibrahim, R.,Whitt,W.: Real-time delay estimation based on delay history. Manuf. Serv. Oper.Manag. 11(3), 397–415 (2009a)
-  Ibrahim, R., Whitt, W.: Real-time delay estimation in overloaded multiserver queues with abandonments. Manag. Sci. 55(10), 1729–1742 (2009b)
-  Ibrahim, R., Whitt, W.: Real-time delay estimation based on delay history in many-server service systems with time-varying arrivals. Prod. Oper. Manag. 20(5), 654–667 (2011a)
-  Ibrahim, R., Whitt, W.: Wait-time predictors for customer service systems with time-varying demand and capacity. Oper. Res. 59(5), 1106–1118 (2011b)
-  Demir, E., Demir, V.B.: Predicting flight delays with artificial neural networks: case study of an airport. In: 2017 25th IEEE Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2017)
-  Senderovich, A., Weidlich, M., Gal, A., Mandelbaum, A.: Queue mining–predicting delays in service processes. In: International Conference on Advanced Information Systems Engineering, pp. 42–57. Springer, Berlin (2014)
-  Ang, E., Kwasnick, S., Bayati, M., Plambeck, E., Aratow, M.: Accurate emergency department wait time prediction. Manuf. Serv. Oper. Manag. 18(1), 141–156 (2015)
-  D. V. Lindley. The theory of queues with a single server. Mathematical Proceedings of the Cambridge Philosophical Society, 48(2):277–289, 1952.
-  C. Bishop. Mixture density networks. Technical report, 1994.
-  J. Kiefer and J. Wolfowitz. On the theory of queues with many servers. Transactions of the American Mathematical Society, 78(1):1–18, 1955.
-  A. Brando Guillaumes, “Mixture density networks for distribution and uncertainty estimation,” Master’s thesis, Universitat Politecnica ‘ de Catalunya, 2017.
-  Kim, S. H. and Whitt, W. Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? M&SOM 16 464–480, 2014.
-  https://github.com/cpmpercussion/keras-mdn-layer