Identifying and Responding to Outlier Demand in Revenue Management

Revenue management strongly relies on accurate forecasts. Thus, when extraordinary events cause outlier demand, revenue management systems need to recognise this and adapt both forecast and controls. State-of-the-art systems rely on analyst expertise to identify outlier demand both online (within the booking horizon) and offline (in hindsight). In light of the partial nature of revenue management data and censoring effects from inventory controls, so far, there exists little research on automating the detection of outlier demand. To remedy this, we propose a novel approach, which detects outliers using functional data analysis in combination with extrapolation via time-series forecasting. We evaluate the approach in a simulation framework, which generates outliers by manipulating the demand model. By evaluating the full feedback-driven system of forecast and optimisation, we generate insight on the asymmetric effects of positive and negative demand outliers in light of revenue management heuristics that do or do not account for customer choice. Furthermore, we quantify the value of both online and offline outlier detection. We show that identifying instances of outlier demand using our methodology, and adjusting the forecast in a timely fashion, substantially increases revenue compared to what is earned when ignoring outliers.



There are no comments yet.


page 38


A machine learning approach to itinerary-level booking prediction in competitive airline markets

Demand forecasting is extremely important in revenue management. After a...

Detecting outlying demand in multi-leg bookings for transportation networks

Network effects complicate demand forecasting in general, and outlier de...

Demand forecasting in hospitality using smoothed demand curves

Forecasting demand is one of the fundamental components of a successful ...

Product age based demand forecast model for fashion retail

Fashion retailers require accurate demand forecasts for the next season,...

Demand Forecasting in the Presence of Systematic Events: Cases in Capturing Sales Promotions

Reliable demand forecasts are critical for the effective supply chain ma...

On Policies for Single-leg Revenue Management with Limited Demand Information

In this paper we study the single-leg revenue management problem, with n...

A feature-based framework for detecting technical outliers in water-quality data from in situ sensors

Outliers due to technical errors in water-quality data from in situ sens...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Transport providers particularly consider knowledge on expected demand as the basis to let RM optimise offers for perishable products given a fixed capacity and low marginal cost. Specifically, quantity-based RM approaches optimise the number of units to sell at different times of a fixed booking horizon, whereas price-based approaches dynamically set the optimal price to offer across the fixed booking horizon. The vast majority of RM implementations of either approach consider a given demand forecast as input for the optimisation model. For quantity-based revenue management, Weatherford and Belobaba (2002) highlight that inaccurate demand forecasts can significantly reduce revenue. Beyond RM, as pointed out in Banerjee et al. (2019), detailed demand forecasts also support in further planning steps, such as network resource and fuel planning. Cleophas et al. (2017)

list several causes for forecast inaccuracies: Primarily, the unavoidable variance of day-to-day demand prohibits perfectly accurate forecasts. Additionally, any flaw in the forecast model, including both the predictive time series component and the customer choice model naturally causes model-based forecast errors. Demand forecasting for RM is further complicated by censored data: most common demand forecast approaches rely on observed sales, which are censored by inventory controls, to estimate the demand in a market

(Weatherford and Pölt, 2002). Finally, sudden shifts in the market may cause short-term, temporal outliers. For example, when the system does not account for special events such as a sports championship or a trade fair, these will cause observed demand to systematically deviate from predictions. In this paper, we focus on such demand outliers. Demand outliers affect revenue management systems in two ways: (i) in foresight, the flawed forecast results in non-optimal capacity allocations; and (ii) in hindsight, the outlier can contaminate the data underlying future forecasts. Hence, the system needs to identify booking patterns that were created by outlier demand online, within the booking horizon, to improve foresight, and offline, after observing sales accumulate across a booking horizon, to improve hindsight. In this context, we consider booking patterns to describe the accumulated number of bookings across the booking horizon reported for a set of fixed, discrete time intervals. These may be aggregated across fare classes and are reported either for single resources, such as flight legs, or for complementary combinations of resources, such as network itineraries. We follow the definition by Hawkins (1980) and define an outlier as ‘an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.’ The research presented here explores the use of outlier detection techniques in identifying critical booking patterns to support the work of revenue management analysts. Thus, our research contribution constitutes an ancillary method to demand forecasting according to the definition of Banerjee et al. (2019). By investigating practical implementations of revenue management in the airline and railway industry, we find that the current process currently relies on analysts to manually examine booking patterns. When analysts perceive demand outliers, they attempt to compensate by adjusting the reported data, the forecast, or the resulting inventory controls. The decision of whether or not an adjustment is necessary and what that adjustment should be depends on the analysts’ intuition of how bookings typically accumulate on the markets they monitor. As noted by Cleophas et al. (2017); Banerjee et al. (2019), little existing work systematically measures the effect of such interventions, and there is even less consideration of providing systematic analytics support for the related decisions. However, research on human decision making in general, and judgemental forecasting in particular, clearly demonstrates fallibility and bias (O’Connor et al., 1993; Lawrence et al., 2000). This motivates the need for automated alerts to highlight outliers to support analysts. To our knowledge, we are the first to propose an automated methodology for outlier detection in the RM domain. Specifically, this paper makes the following contributions to the area: (i) it proposes a novel outlier detection approach, combining functional data analysis and time series forecasting, which yields superior detection performance; (ii) it provides a simulation-based framework for generating booking patterns based on controlled demand outliers and evaluating their effect throughout the RM process; (iii) it demonstrates the asymmetric effects of outliers under different optimisation heuristics (iv) it quantifies the benefits from successful online or offline outlier detection for RM. As this paper considers approaches to identifying and responding to outlier demand and implications for RM, we review related work from two perspectives in the following sections. Firstly, we establish the domain context by considering the role of demand forecasting and forecast evaluation in revenue management in 2. Secondly, we establish a methodological background by surveying relevant outlier detection methods in 3. Section 4 introduces a new approach to outlier detection, combining functional data analysis with extrapolation. Section 5 introduces the simulation-based framework. In Section 6, we compare and benchmark the described outlier detection approaches. Additionally, we provide an analysis of potential revenue gains from improved foresight when outliers are detected online and controls are updated. Conclusions and potential directions for future research are discussed in Section 7.

2 RM Forecasts and Forecast Evaluation

The importance of accurate forecasts as input to revenue optimisation is well-documented in the literature. Authors are largely concerned with demand forecasting (Pereira (2016), Weatherford and Pölt (2002), Talluri and Van Ryzin (2004)), although forecasting cancellations and no-shows has also been explored (Morales and Wang (2010)). Weatherford and Belobaba (2002) confirm previous findings that inaccurate demand estimates can have significant impact on revenue. Under the use of optimisation heuristics such as EMSR (Belobaba (1989)), under- or over-forecasting can even be beneficial. As described by Mukhopadhyay et al. (2007), most RM systems require forecasts of the actual demand, rather than the observed demand. The actual demand is the combination of observed demand and customer requests that were denied due to restrictive inventory controls. This is difficult to observe in practice, and so must be estimated. Weatherford and Pölt (2002) survey various techniques for unconstraining demand. When allowing for inaccurate demand forecasts, much RM research focuses on rendering the optimisation component more robust or forecast-independent, as detailed in the contributions reviewed in Gönsch (2017). In another review, Cleophas et al. (2017) point out that there is little research into the effects of manually adjusted forecasts in RM. Mukhopadhyay et al. (2007) propose a method for measuring the performance of adjusted and unadjusted forecasts, which takes into account demand constraining. They find that if analysts can reliably improve demand forecasts on critical flights, significantly more revenue can be generated. Zeni (2003) describe a study at US Airways, which aimed to isolate and estimate the value of analyst interactions. According to that study, around 3% of the additional revenue generated within the duration of the study could be attributed to analyst input. Given that carrying out experiments in a live RM system carries significant risks, the use of simulation for evaluation is common. Additionally, simulation studies enable a priori knowledge about the true demand generation process, which can never be known in a real-world setting. Frank et al. (2008) discuss the use of simulation for RM and provide guidelines; in a related effort, Kimms and Müller-Bungart (2007) consider demand modelling for RM simulations. The paper at hand follows these contributions in establishing a simulation-based framework to generate outlier observations. Doreswamy et al. (2015) employ simulation as a tool to analyse the effects of different revenue management techniques for different airlines, when switching from leg-based controls to network controls. Cleophas et al. (2009) focus on an approach to evaluating the quality of RM forecasts in the airline setting, both in terms of revenue and common forecast error measurements. Another example of using simulation to evaluate the performance of forecast components is given in Bartke et al. (2018). Temath et al. (2010) used a simulation-based approach to evaluate the robustness of a network-based revenue opportunity model when input data is flawed. In the broader context of demand forecasting, Petropoulos et al. (2014) evaluate fitting time series forecasts for particular patterns of demand evaluation by manipulating these patterns in a simulation framework.

3 Existing Work on Outlier Detection

It is important at this stage to highlight the distinction between identifying outlying observations within a time series (Figure 0(a)), and identifying an entire time series (in our case, booking curve) as an outlier (Figure 0(b)). In this paper, we aim to identify the latter by considering time series of the number of bookings in each booking interval.

(a) Outlier within a given time series
(b) Outlying time series within a collection of series
Figure 1: Different types of outliers in time series data

Literature on handling outliers in the RM process is scarce, though there is some discussion in Weatherford and Kimes (2003): the authors consider removing outliers caused by atypical events, such as holidays and special conventions, to improve future forecasting. However, they propose only a simple method of outlier detection (removing observations outside of the mean

3 standard deviations), and do not seek to identify outliers online through the booking horizon. Beyond RM, there exists a wealth of literature on the study of outliers (also referred to as anomalies) in time series, as reviewed by

Chandola et al. (2009) and Pimentel et al. (2014). Hubert et al. (2015) survey various functional outlier detection techniques for time series data, and apply their methods to multiple real data sets. Barrow and Kourentzes (2018)

consider the effect of functional outliers for call centre workload management and recommend an artificial neural network to model them as part of the forecast rather than identifying them.

Talagala et al. (2019) propose a sliding window approach for detecting outlying time series within a set of (nonstationary) time series, based on the use of extreme value theory for outlier detection. The authors make a similar distinction between identifying outliers within a time series, and identifying a outlying series from a set of time series. The remainder of this paper distinguishes three classes of approaches: (i) univariate, (ii) multivariate, or (iii) functional.

Univariate Approaches

Univariate outlier detection techniques identify anomalous observations of a single variable, and so can be applied independently, e.g., to the number of bookings in each booking interval. We consider two subcategories: (i) limit-based approaches, which calculate an upper and lower threshold for what constitutes regular observations. (ii) score-based approaches, which calculate a score per observation, and determine an outlier based on whether the score is above or below some threshold.

  • [leftmargin=*]

  • Nonparametric Percentiles: This class of approaches uses lower and upper percentiles of the observed empirical distribution at each time point as limits for what constitutes a regular observation as opposed to an outlier. This type of percentile-based approach is discussed by Pincus et al. (1995). It can be used as a basic way to estimate statistics in a more robust manner, by trimming or winsorising the data (see Dixon and Yuen (1974)

    ). The downside of this approach is that a fixed percentage of the data will always be classified as outliers, even when there are fewer or more actual outliers in the data.

  • Tolerance Intervals: Statistical tolerance intervals contain at least a specified proportion of observations with a specific confidence level (Hahn and Chandra, 1981). They require two parameters: the coverage proportion, , and confidence level, . For booking patterns, at each interval of the booking horizon, these approaches define a tolerance interval as applicable up to that point in time. If the number of observed bookings lies outside of this tolerance interval, the pattern is deemed an outlier. Nonparametric tolerance intervals do not assume an underlying distribution, and instead are based on the order statistics of the data (Wilks, 1941). Parametric tolerance intervals assume an underlying distribution (Hahn and Chandra, 1981). The choice of distribution is not arbitrary, and a bad choice of distribution will perform poorly. Liang and Cao (2018)

    choose to fit a Normal distribution to hotel booking data to detect anomalous observations.

  • Robust Z-Score

    : The -score measures where an observation lies in relation to the mean and standard deviation of the overall data (Iglewicz and Hoaglin, 1993). The robust z-score uses the median and the median absolute deviation to provide a similar measurement. As such, an observation with a robust z-score above some threshold is classified as an outlier. This score-based method assumes that the observations at time

    are approximately normally distributed based on two justifications: (i) A large proportion of univariate outlier detection methods rely on distributional assumptions (often normality). (ii) Although the discrete, non-negative integer nature of booking data suggests the use of a Poisson distribution, in the presence of trend or seasonal adjustments, the data may no longer have these properties.

Multivariate Approaches

Univariate outlier detection approaches ignore the dependence both within and between booking patterns. To somewhat capture the dependence within, we turn to distance- and clustering-based multivariate methods.

  • [leftmargin=*]

  • Distance-based Approaches

    : Each time series can be characterised by its distance to each other time series. Aggregating these distances transforms the problem into a univariate outlier detection problem, based on the mean distances. In this approach, the choice of distance metric is crucial, as some perform better than others for high dimensional data, due to issues with sparsity, as discussed by

    Aggarwal et al. (2001). In particular, the authors state that the Manhattan distance metric (or norm) tends to outperform the Euclidean distance for such data. We consider both Euclidean and Manhattan distance metrics in our comparative evaluation.

  • K-Means Clustering: -means clustering splits the observations into groups by iteratively minimising the distance between each observation and the centre of its assigned closest cluster (see e.g. MacQueen (1967)). As in the distance-based approaches, the choice of distance metric to minimise is highly relevant for clustering. Once more, the further evaluation in this paper will compare Euclidean and Manhattan distance metrics. The approach identifies those points which are furthest away from the centre of their cluster, over some distance threshold, as outliers (Deb and Dey, 2017).

Functional approaches

Multivariate outlier detection approaches consider multiple data points from the same time series, but account neither for the time dependency of observations nor for the dependence between time series. In sets of booking patterns, outlier events would cause clusters of outliers caused by increased demand on multiple booking curves connected to close departure days. In addition, multivariate approaches are also not ideal when the data is of high dimensionality, e.g. when the number of time intervals describing a booking pattern exceeds the number of patterns in the data set. To remedy these shortcomings, we turn to functional analysis. Functional approaches treat each booking pattern as observations of a real-valued function. In the functional analysis setting, as discussed by Febrero et al. (2008), a rigorous definition of an outlier does not exist. Hence, we choose to use the same definition as Febrero et al. (2008): ‘a curve is an outlier if it has been generated by a stochastic process with a different distribution than the rest of curves, which are assumed to be identically distributed’. This can be seen as a more specific definition of the one by Hawkins (1980), in the functional data setting. Functional depth is the notion of ordering points in space, as discussed by Febrero et al. (2008). That is ‘depths provide a way to order points in the Euclidean space from centre to outward, such that points near the centre should have higher depth and points far from the centre should have lower depth.’ As such, the inverse notion of the degree of abnormality of a curve can be also characterised by its functional depth, if its depth is particularly low Hubert et al. (2015). Depth-based approaches for detecting outlying curves are discussed in detail by Hubert et al. (2012). In this paper, we focus on the multivariate halfspace depth described by Claeskens et al. (2014). Further technical details of all outlier detection methods described here are available in Appendix A.

4 Proposed Methodology: Using Extrapolation to Improve Functional Outlier Detection

To improve foresight, demand outliers need to be identified online and as early as possible in the booking horizon. This enables the RM system to update controls for the remainder of the horizon. We term this problem online outlier detection. When tasked with online detection at time in the booking horizon, all approaches discussed in the previous section exclusively consider the first observation intervals. In the online setting, only a partial booking curve is available for analysis. Therefore, we propose to supplement the outlier detection approach by extrapolating the expected bookings from the current time up to the end of the booking horizon, . We treat this extrapolation as a missing data problem and solve it by forecasting based on the bookings observed so far. In the computational study, we compare simple exponential smoothing (SES) (Chatfield, 1975), autoregressive integrated moving average models (ARIMA) (Box and Jenkins, 1970), and integrated generalised autoregressive conditional heteroskedasticity (IGARCH) models (Tsay, 2002). Algorithm 1 outlines the procedure on a set of booking patterns observed until time : Given an entire booking horizon of length with , then is a time series describing the bookings for pattern up to time : .

1 At time forecast the accumulation of bookings at each time , , for each booking curve ;
2 Calculate , the functional depth of the observed and extrapolated booking curve , for each booking curve at time . ;
3 Calculate a threshold, , for the functional depth. ;
4 if  then
5        Define booking curve as an outlier. Delete booking curve from the sample of curves.
6 end if
7while  do
8        Recalculate functional depths on the new sample, and remove further outliers.
9 end while
Algorithm 1 Using extrapolation to improve functional outlier detection

Figure 2 demonstrates the algorithmic approach; in the extensive simulation analysis, we apply it to a variety of booking patterns and outliers. Figure 1(a) shows 25 booking curves that have been observed from the beginning of the booking horizon until 25 booking intervals before its end. The extrapolation step is shown in Figure 1(b), with the purple lines depicting the ARIMA forecasts of accumulated bookings until the end of the horizon. The empirical distribution of the functional depths of the extrapolated sample are shown in Figure 1(c), with the threshold shown in red (computed via the bootstrapping routine described in Appendix A). The true positives and false positives returned by the algorithm are shown in blue and red, respectively, in Figure 1(d). Note that there were no false negatives i.e. all outliers were detected.

(a) Observed booking curves up to
(b) ARIMA extrapolation
(c) Histogram of functional depths with threshold (red)
(d) Observed booking curves with true positives (blue) and false positives (red)
Figure 2: Example: functional halfspace depth with ARIMA extrapolation outlier detection

The proposed approach is open to two design decisions: Firstly, it can feature any of the multivariate or functional approaches reviewed Section 3.111It is not applicable for univariate outlier detection methods as, in this setting, the number of bookings at each point in time is considered independently of past or future bookings. However, a functional approach provides more scope for extensions, such as considering seasonality and increasing the frequency of outlier detection. Secondly, the approach can utilise a variety of forecasting methods for extrapolating. Note that the forecasting methodology employed for this extrapolation is independent of the forecasting methodology to predict demand for RM.

5 Simulation-based Framework

To quantify effects from demand outliers and evaluate outlier detection approaches, we simulate a basic RM system. Since the simulation renders the process of demand generation process explicit, computational experiments can yield truthful detection rates. This is impossible in empirical data analysis, where the true demand and the distinction of regular versus outlier demand is never fully certain. Thereby, simulation modelling provides a loophole to the problem of creating reproducible forecasting research highlighted, for instance, by Boylan et al. (2015). The simulation implements the following process:

  1. Generate multiple instances of regular and outlier demand in terms of customer requests arriving across the booking horizon.

  2. Use the demand model underlying regular demand to forecast the number of expected requests per fare class and time in the booking horizon.

  3. Use the demand forecast to compute quantity-based inventory controls that maximise expected revenue from bookings.

  4. Use the inventory controls from (3) to transform arriving requests from (1) into constrained booking patterns over the course of multiple consecutive simulated booking horizons.

  5. Analyse booking patterns to identify booking horizons with outlier demand.

Symbol Definition
The set of customer types,
The set of fare classes,

Total customer arrivals (Gamma distributed with parameters

and )
, Parameters of Gamma distribution for number of total customer arrivals
Proportion of total customer arrivals stemming from type

Standardised Beta distribution with parameters

Time-dependent rate of the Poisson process of type customer arrivals
realisation of Poisson process of type customers purchasing fare class at time
, Parameters of Beta distribution,
Probability of a customer of type being willing-to-pay at most fare class
Average fare for fare class
Mean of demand for fare class
Variance of demand for fare class
D Total Demand
C Capacity
Demand Factor
Table 1: Table of notation used for simulation

Table 1 sets out the notation used in the remainder of this section to detail the demand model, demand forecasting, revenue maximisation heuristics, and inventory controls. In this, we detail both the models and algorithms, and the parameter settings implemented in the computational study.

5.1 Generating Demand in Terms of Customer Requests

Heterogeneous demand is a frequently stated RM precondition, assuming that customer segments differ in value and can be identified through their idiosyncratic booking behaviour. To model this, the simulation features two customer types and could be trivially extended to feature more. Here, we index any parameter that characterises high-value customers with index and any parameter that characterises low-value customers with index . We assume that requests from high-value customers typically arrive later in the booking horizon than those from low-value customers, as would typically be the case. High-value customers are more likely to book expensive fare classes when cheap fare classes are not offered. We follow Weatherford et al. (1993) in modelling requests from either customer type as arriving according to a non-homogeneous Poisson-Gamma process. Requests from customer type arrive according to a Poisson() distribution; those from customer type arrive according to a Poisson() distribution. The total number of customer arrivals is split between the two segments, such that



with probability density function:


The constraint ensures that all requests must belong to exactly one customer type. Additionally, we set parameters , , and such that they follow the assumption that valuable customers are more likely to request at later stages of the booking horizon:


Figure 2(a) illustrates arrival rates and across the booking horizon, with Figure 2(b) showing one realisation of request arrivals in a specific horizon. Quantity-based RM differentiates a set of distinct fare classes, , which represent different discount levels, where . In the following, we explain the random choice model used in the simulation to model how customers choose from the set of currently offered classes. The model assumes all customers “buy-down” fully, that is, book the cheapest available fare class. At the same time, not all customers can afford to book any fare class. For every fare class , the probability that a customer of type is willing to pay at most fare class is , as shown in Figure 2(d). Each customer has a single fare class threshold, which is the most they are willing to pay, such that:


where is the the probability of a type customer arriving and choosing not to book based on the classes on offer. Hence, the probability of booking fare class is:


where is the weighted average of probabilities of each customer type being willing to pay up to fare class :


and is the proportion of total customer arrivals stemming from type .

(a) , Arrival rates per
customer type
(b) , Realisation of the Poisson process with rate per customer type
(c) , Cumulative number of customer arrivals per customer type
(d) , Probability of a customer of type being willing to pay up to fare class
Figure 3: Customer arrivals generated by a nonhomogeneous Poisson-Gamma process

While demand arrival rates vary across the booking horizon, the simulation models arrival rates and choice probabilities as stationary between booking horizons. While, in real-world markets, demand shifts in seasonal patterns and trends, we rely on random draws from distributions with stationary parameters as when introducing and detecting outliers, the simplest case lets all regular demand behaviour derive from the same distribution. When an approach cannot correctly detect abnormal demand when all regular demand comes from this same distribution, it is highly unlikely that we will be better able to detect abnormal demand in the case where the normal demand is changing. The case where parameter values change over time can be seen as an extension to this initial simulation. Regular booking curves are generated according to Equation (1), with parameters , and probabilities as shown in Figure 2(d). This results in regular total demand with a mean of 240, and standard deviation of 15.492.

5.2 Outlier Generation

We generate outlier demand by parameterising demand generation in a way that deviates from the regular setting described in the previous section. Combining outlier demand with inventory controls over the course of a booking horizon creates an outlier booking pattern. There exist three approaches to generating outliers by adjusting the parameters in Equations (1) and (2), and the probabilities, :

  1. Increasing or decreasing the volume of demand across the whole (or partial) booking horizon, by adjusting the parameters , and in the Gamma distribution for , the total demand.

  2. Shifting the proportions of demand across fare classes, by either adjusting the choice probabilities per customer type or to the ratio of customer types, .

  3. Shifting the arrival pattern of customer requests over time by adjusting parameters , which controls the time at which customer requests arrive in the standardised Beta distribution.

In the computational study featured here, we focus on analysing the first type of outliers, specifically, by investigating four magnitudes of outliers. While the proposed approach could be adapted to handle the other types of outliers listed above, the base case of examining outlier patterns resulting from even shifts in demand throughout the booking horizon serves as a proof-of-concept. Our choice of parameter changes for outlier generation follows Weatherford and Pölt (2002), who investigate the effects of inaccurate demand forecasts on revenue. In particular, they consider cases where forecasts are 12.5% and 25% higher or lower than the actual demand. We perform a similar analysis on the benefits of detecting outliers where the overall number of customers deviates from regular demand by 12.5% and 25%. The four types of outliers we consider are generated with the same parameters as above, other than and which are as follows: (i) 25% Increase: , ; (ii) 12.5% Increase: , ; (iii) 25% Decrease: , ; and (iv) 12.5% Decrease: , . This results in a change in mean of the desired magnitude and direction, but no change in variance.

5.3 Forecasting Demand

Most quantity-based revenue optimisation techniques rely on knowing the number of expected customer requests per offered product, potentially per set of offered products. The heuristics described in the following section require the mean and the variance of expected requests per fare class. To create an accurate forecast as the baseline in the simulation, we simulate runs of customer arrivals from Equations (1) and (2). Let define the realisation of type customers who booked in fare class at time as drawn from the Poisson arrival process with rate , and probability . Then, the mean of the total demand across all customer types upon departure from simulations can set the forecast for the mean demand for fare class , :


Similarly, the variance of the demand for fare class can be forecasted as:


The resulting sum of customer requests per fare class across customer types gives the total expected demand per fare class. Drawing = 100 instances of customer arrivals and purchase choices yields 100 different levels of the demand for each fare class. The mean and variance of these 100 observations are taken to be the forecasted parameters of a Normal distribution for each fare class demand.

5.4 Heuristic Revenue Optimisation and Inventory Controls

We compare two well-known heuristic methods for obtaining booking controls for a single resource: EMSRb and EMSRb-MR. We pick these heuristics for their wide acceptance and pervasive use in practice. Furthermore, as opposed to, e.g., exact dynamic programming formulations, these heuristics compute the type of inventory controls widely implemented in current practice. We expect the nature of these inventory controls and their updates to be a relevant factor for the recognition and compensation of demand outliers.

  • [leftmargin=*]

  • EMSRb, Expected Marginal Seat Revenue-b, was introduced by Belobaba (1992). EMSRb calculates joint protection levels for all more expensive classes relative to the next cheaper fare class, based on the mean expected demand and its variance.

  • EMSRb-MR: To make the EMSRb heuristic applicable when demand depends on the set of offered fare classes, e.g. when customers choose the cheapest available class, Fiig et al. (2010) introduce this variant. It applies a marginal revenue transformation to demand and fares before calculating the EMSRb protection levels based on transformed fares and predicted demand.

Inventory controls can be implemented in either a partitioned or nested way (Brumelle and McGill (1993), and Talluri and Van Ryzin (2004), Chapter 2). Partitioned controls assign capacity such that each unit can only be sold in one specific fare class. Conversely, nested controls let assignments overlap in a hierarchical manner i.e. units of capacity assigned to one fare class can also be sold in any more expensive fare class. Thus, nested inventory controls ensure that for any offered class, all more expensive classes are also offered—as this seems an intuitive goal in practice, these inventory controls are much more commonly used in application. We implement nested controls for this reason. Inventory controls can be implemented either as static or as dynamic controls. Static controls are computed once, at the start of the booking horizon, and are never updated over the horizon. Dynamic controls can be updated at the end of each booking interval based on the actual observed demand and thereby the remaining capacity. In this paper, we divide the booking horizon into 30 booking intervals and re-optimise the inventory controls at the end of each interval.

5.5 Evaluation of Outlier Detection

We regard outlier detection as a binary classification problem, where the two classes are regular booking patterns and outlier booking patterns. By definition, for any pattern generated in the simulation, we know the true class. Several indicators evaluate the performance of binary classification outcomes, as surveyed by Tharwat (2018). Each outcome falls into one of four categories: (i) if a genuine outlier is correctly classified, it is a true positive (TP); (ii) if a regular observation is correctly classified, it is a true negative (TN); (iii) if a regular observation is wrongly classified as an outlier, it is a false positive (FP); and (iv) if a genuine outlier is wrongly classified as regular, it is a false negative (FN). To analyse results in this paper, we implement the Balanced Classification Rate (BCR) as suggested by Tharwat (2018). This indicator accounts for both the average of the true positive rate and true negative rate:


The notions of high detection rates (fraction of genuine outliers which are correctly detected) and low false positive rates (fraction of regular observations which are incorrectly labelled as outliers) create conflicting objectives. For example, a high true positive rate does not necessarily indicate a high performing algorithm, if it is accompanied by a high false positive rate. Therefore, combining both into a single figure is useful. Typically, the number of outliers is outweighed by the number of normal observations. This leads to one class being significantly larger than the other. BCR is robust to this imbalance.

6 Results

To investigate different outlier simulation and detection techniques we follow a four-step process. First, in Section 6.1, we compare output from EMSRb and EMSRb-MR heuristics. We assess the revenue generated under each method, and the potential gains in revenue from identifying outliers. Secondly, we contrast foresight detection performance of different outlier detection methods in Section 6.2. This analysis focuses on detection performance across the entire booking horizon, and evaluates the detection approaches’ ability to detect outliers early in the booking horizon. We also quantify the gain in outlier detection performance resulting from the inclusion of the extrapolation step proposed in Section 4. Thirdly, Section 6.3 investigates the effect of different magnitudes of outliers on the performance of the outlier detection method. Finally, Section 6.4 presents a final set of experiments intended to measure the potential increase in revenue generated by analysts correctly taking actions based on alerts from the proposed method of outlier detection. Our study does not consider outliers caused by shifts in arrival patterns, or in the distribution of demand over fare classes—outlying demand is simply created by (multiplicatively) shifting demand equally across time and all classes. We therefore classify the presence of outliers from the observed total aggregated bookings, as opposed to using bookings per fare class. The results we present use the four types of outliers discussed in Section 5.2 ( of the regular demand level).

6.1 EMSRb vs EMSRb-MR

Table 2 shows the resulting revenue, using the simulation setup described in Section 5, under EMSRb and EMSRb-MR booking controls with different demand factors, as compared to accepting bookings on a first-come-first-served basis (FCFS). Both heuristics offer an improvement over FCFS. Given the presence of buy-down in the demand model, EMSRb-MR outperforms EMSRb, particularly in situations that feature a high demand-to-capacity ratio.

Demand Factor FCFS Revenue (€) EMSRb as Factor of FCFS EMSRb-MR as Factor of FCFS
0.90 28948.50 1.03 1.06
1.20 34835.50 1.04 1.08
1.50 35000.00 1.05 1.09
Table 2: Revenue generated under EMSRb vs EMSRb-MR booking controls

To evaluate the effect of demand deviating from the forecasts used by EMSRb and EMSRb-MR, we now introduce a best-case scenario where the RM system anticipates outliers and generates accurate demand forecasts (as opposed to implementing booking controls based on the initial erroneous forecasts). The percentage change in revenue, when switching from erroneous to correct forecasts, under four demand changes is shown in Table 3. Results show the impact of detecting and correcting outliers in demand depends on the demand factor, the choice of booking control heuristic, and the magnitude of the demand deviation. Under EMSRb, the effect on revenue is asymmetric across positive and negative outliers. When the outlier is caused by a decrease in demand, correcting the forecast and updating controls leads to significant increases in revenue, particularly at higher demand factors. Conversely, when the outlier is caused by an increase in demand, correcting the forecast and updating controls has a negative impact on revenue. Although counter-intuitive at first glance, this agrees with previous findings. EMSRb is known to be too conservative (Weatherford and Belobaba, 2002) and reserve too many units of capacity for high fare classes, thereby rejecting an excessive number of requests from customers with a lower willingness to pay. In consequence, there is left-over capacity at the end of the booking horizon. Hence, under-forecasting can be beneficial under EMSRb. Under EMSRb-MR booking controls, the results are more symmetric across positive and negative outliers, in that correctly adjusting forecasts increases revenue regardless of whether the initial forecast was too high or too low. Under both types of heuristic, the magnitude of the change in revenue (either positive or negative) is generally larger when the change in demand from the forecast is larger.

Optimisation Forecasted % Change in Demand from Forecast
Heuristic Demand Factor -25% -12.5% +12.5% +25%
0.90 +0.1% +0.1% -0.9% -3.6%
EMSRb 1.20 +10.2% +6.4% -2.3% -2.3%
1.50 +12.2% +4.4% -4.5% -6.8%
Avg. +7.5% +3.6% -2.5% -4.2%
0.90 +2.3% +1.3% +0.4% +2.9%
EMSRb-MR 1.20 +2.0% +4.1% +4.4% +9.9%
1.50 +16.2% +7.7% +5.0% +9.5%
Avg. +6.9% +4.4% +3.3% +7.4%
Table 3: % Change in revenue resulting from correcting inaccurate demand forecasts

Furthermore, we compared the performance of different outlier detection methods under the two different heuristics. The results (omitted for space considerations) under EMSRb and EMSRb-MR were found to be very similar regardless of the outlier detection method used. Given the similarity in performance between the two heuristics, and that EMSRb-MR accounts for the more realistic demand model of customers choosing the cheapest class offered, the remainder of the results in Section 6 relate to those from EMSRb-MR.

(a) Comparison of best performing outlier detection methods
(b) Improvement from incorporating extrapolation
Figure 4: Comparison of foresight outlier detection averaged over different magnitudes of demand outliers with 5% outlier frequency

6.2 Comparison of Methods for Foresight Detection

In a wide-ranging computational study (see Appendix B, Table 6), we compared the performance of all outlier detection methods described in Section 3. For conciseness, the results discussed here focus on the best univariate method, parametric (Poisson) tolerance intervals; the best multivariate method, -means clustering with Euclidean distance; the best functional method, halfspace depth; and the best extrapolation method, ARIMA extrapolation combined with halfspace depth. For evaluating foresight detection performance, we display the average BCR per booking interval in Figure 3(a). Initially all four methods perform similarly due to the low number of observed bookings, but at around 21 booking intervals before departure, the average BCR of functional methods quickly accelerate towards 1, whereas the univariate and multivariate approaches at best only show mild improvements in classification performance. The addition of ARIMA extrapolation seems to markedly accelerate classification performance, especially between 20 and 10 booking intervals before departure. In Figure 3(b), we also compare functional depth with IGARCH and SES extrapolation, and similar improvements are observed as with ARIMA extrapolation.

(a) Parametric (Poisson) tolerance intervals outlier detection
(b) Functional halfspace depth with ARIMA extrapolation outlier detection
Figure 5: Balanced Classification Rate under different magnitudes of outliers with 5% outlier frequency

6.3 Effects from Different Magnitudes of Outliers

We now investigate how the average BCR varies across the four different magnitudes of outliers considered (). In Figure 4(a) we display the average BCR over time for parametric (Poisson) tolerance intervals. We observe that higher magnitudes of outliers are easier to classify, but also decreases in demand are easier to classify than increases. The latter observation is intrinsic to RM systems. This is because an unexpected decrease in demand causes a decrease in bookings, but an increase in demand does not necessarily result in an increase in bookings if the booking limit for a fare class has been reached, i.e., if the fare class is no longer offered. This effect leads to the phenomenon of observing a constrained version of demand, which is a known issue in revenue management. Similar observations were made when testing all other univariate and multivariate outlier detection approaches. In contrast however, consider Figure 4(b) where we display the average BCR over time with functional halfspace depth and ARIMA extrapolation. Here the average BCR is very similar for all four magnitudes of outliers considered. This classification approach therefore appears to be very robust to the magnitude and direction of outliers considered. We hypothesise that much smaller outlier magnitudes, outside of our simulation setup, would need to be considered before the average BCR decreases. In all simulations in this section we have set the outlier frequency to 5%. We note however that when we tested the sensitivity of approaches to different frequencies of outliers (ranging from 1% to 10%, results omitted here for space considerations), we found little impact on outlier detection performance across methods, such that the conclusions drawn from this section are generally robust.

6.4 Revenue Improvement Under Outlier Detection-based Analyst Alerts

Figure 6: Gain in revenue under different magnitudes of outliers using functional depth with ARIMA extrapolation

Figure 6 shows the average percentage gain in revenue, at each point in the booking horizon, from analysts correcting forecasts for those booking curves identified as outliers. The percentage gain is in comparison to the analyst making no changes and using the incorrect forecast for the entirety of the booking horizon. The outlier detection method of choice in Figure 6 is functional depth with ARIMA extrapolation. We consider an idealised scenario, in that when a booking curve is flagged as an outlier, if it is a true positive (genuine outlier) then analysts adjust the forecast according to the correct distribution. Similarly, if the flagged outlier is a false positive, analysts do not make any changes to the forecast. Although idealised, the results here highlight the potential gains in revenue from analyst intervention, as well as the utility of using functional outlier detection in detecting true positives and avoiding false negatives (missed outliers). Results show the use of our method creates a peak early in the booking horizon, when the potential revenue gain is highest. This peak is caused by a combination of being far enough into the booking horizon such that some bookings have occurred and the outlier detection method is able to identify outliers, but being early enough in the horizon such that any actions taken still have time to make an impact.

7 Conclusion and Outlook

In conclusion, the work presented in this paper gives rise to three insights. Firstly, outliers in demand diminish revenue when they go undetected. The exact effect depends on the combination of outlier and optimisation method. Nevertheless, we argue that using a heuristic with an intrinsic bias that is then compensated by undetected outliers (as observed for EMSRb and undetected positive demand outliers) cannot be desirable for an automated system. Secondly, we benchmarked a set of outlier detection techniques and find that the functional outlier detection approach is most promising in terms of performance, and offers the most scope for further extensions. In particular, our results show that implementing the proposed extrapolation step significantly improves the performance of functional outlier detection techniques. In terms of the performance of different outlier detection techniques, we have shown that univariate approaches are too simple and under-perform as they fail to capture time-dependence between and within booking curves. Multivariate approaches outperform univariate methods by additionally taking into account previous bookings. As a third finding, we have demonstrated that identifying outlier booking curves and adjusting the demand forecast accurately early in the booking horizon supports revenue optimisation. Currently, revenue management analysts decide on which booking patterns are outliers based on their previous experience of observing demand and their knowledge about special events. Automated outlier detection routines provide another procedure of alerting analysts to unusual patterns. If the detection algorithm identifies a booking pattern as an outlier, the RM system alerts the responsible analyst. When the system and the analyst agree that a booking pattern is critical and that it requires intervention, an analyst must decide which action(s) to take. Specifically, they need to decide whether to increase or decrease the forecast or inventory controls, and by how much. Further work could investigate methods to adjust the initial forecast to account for outliers. Last but not least, as stated in the introduction, revenue management techniques are typically either quantity-based or price-based. This paper considered the effects of outliers and outlier detection techniques in the quantity-based revenue management setting. A natural extension is to consider a similar analysis in the price-based revenue management setting.


  • Aggarwal et al. (2001) Aggarwal, C. C., Hinneburg, A., and Keim, D. A. (2001). On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Van den Bussche J., Vianu V. (eds) Database Theory. ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg, pages 420–434.
  • Banerjee et al. (2019) Banerjee, N., Morton, A., and Akartunalı, K. (2019). Passenger demand forecasting in scheduled transportation. European Journal of Operational Research. in press.
  • Barrow and Kourentzes (2018) Barrow, D. and Kourentzes, N. (2018). The impact of special days in call arrivals forecasting: A neural network approach to modelling special days. European Journal of Operational Research, 264(3):967–977.
  • Bartke et al. (2018) Bartke, P., Kliewer, N., and Cleophas, C. (2018). Benchmarking filter-based demand estimates for airline revenue management. EURO Journal on Transportation and Logistics, 7(1):57–88.
  • Belobaba (1989) Belobaba, P. P. (1989). OR Practice: Application of a Probabilistic Decision Model to Airline Seat Inventory Control. Operations Research, 37(2).
  • Belobaba (1992) Belobaba, P. P. (1992). Optimal vs. heuristic methods for nested seat allocation. In Proceedings of AGIFORS Reservations and Yield Management Study Group (1992), pages 28–53.
  • Box and Jenkins (1970) Box, G. and Jenkins, G. (1970). Time Aeries Analysis: Forecasting and Control. San Francisco: Holden-Days.
  • Boylan et al. (2015) Boylan, J., Goodwin, P., Mohammadipour, M., and Syntetos, A. (2015). Reproducibility in forecasting research. International Journal of Forecasting, 31(1):79–90.
  • Brumelle and McGill (1993) Brumelle, S. L. and McGill, J. I. (1993). Airline seat allocation with multiple nested fare classes. Operations Research, 41(1):127–137.
  • Chandola et al. (2009) Chandola, V., Banerjee, A., and Kumar, V. (2009).

    Survey of Anomaly Detection.

    ACM Computing Survey, 41(3):1–72.
  • Chatfield (1975) Chatfield, C. (1975). The Analysis of Time Series: An Introduction. Chapman and Hall.
  • Claeskens et al. (2014) Claeskens, G., Hubert, M., Slaets, L., and Vakili, K. (2014). Multivariate functional halfspace depth. Journal of the American Statistical Association, 109(505):411–423.
  • Cleophas et al. (2009) Cleophas, C., Frank, M., and Kliewer, N. (2009). Simulation-based key performance indicators for evaluating the quality of airline demand forecasting. Journal of Revenue and Pricing Management, 4(8):330–342.
  • Cleophas et al. (2017) Cleophas, C., Kadatz, D., and Vock, S. (2017). A Literature Survey of Recent Theoretical Advances. Journal of Revenue and Pricing Management, 16(5):483–498.
  • Deb and Dey (2017) Deb, A. B. and Dey, L. (2017).

    Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering.

    World Journal of Computer Application and Technology, 5(2):24–29.
  • Dixon and Yuen (1974) Dixon, W. and Yuen, K. K. (1974). Trimming and winsorization: A review. Statistische Hefte, 15(2-3):157–170.
  • Doreswamy et al. (2015) Doreswamy, G. R., Kothari, A. S., and Tirumalachetty, S. (2015). Simulating the flavors of revenue management for airlines. Journal of Revenue and Pricing Management, 6(14):421–432.
  • Febrero et al. (2008) Febrero, M., Galeano, P., and González-Manteiga, W. (2008). Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels. Environmetrics, 19(4):331–345.
  • Fiig et al. (2010) Fiig, T., Isler, K., Hopperstad, C., and Belobaba, P. (2010). Optimization of mixed fare structures: Theory and applications. Journal of Revenue & Pricing Management, 9(12):152–170.
  • Frank et al. (2008) Frank, M., Friedemann, M., and Schröder, A. (2008). Principles for simulations in revenue management. Journal of Revenue and Pricing Management, 1(7):215–236.
  • Gönsch (2017) Gönsch, J. (2017). A survey on risk-averse and robust revenue management. European Journal of Operational Research, 263(2):337–348.
  • Hahn and Chandra (1981) Hahn, G. J. and Chandra, R. (1981). Tolerance Intervals for Poisson and Binomial Variables. Journal of Quality Technology, 13(2):100–110.
  • Hawkins (1980) Hawkins, D. (1980). Identification of Outliers. Chapman and Hall.
  • Hubert et al. (2012) Hubert, M., Claeskens, G., De Ketelaere, B., and Vakili, K. (2012). A new depth-based approach for detecting outlying curves. In Colubi, A., Fokianos, K., Gonzalez-Rodriguez, G., and Kontoghiorghes, E., editors, Proceedings of COMPSTAT 2012, pages 329–340.
  • Hubert et al. (2015) Hubert, M., Rousseeuw, P. J., and Segaert, P. (2015). Multivariate functional outlier detection. Statistical Methods and Applications, 24(2):177–202.
  • Iglewicz and Hoaglin (1993) Iglewicz, B. and Hoaglin, D. (1993). The ASQC Basic References in Quality Control: Statistical Techniques. In: Mykytka, E.F., (eds), How to Detect and Handle Outliers, 16.
  • Kimms and Müller-Bungart (2007) Kimms, A. and Müller-Bungart, M. (2007). Simulation of stochastic demand data streams for network revenue management problems. OR Spectrum, 1(29):5–20.
  • Lawrence et al. (2000) Lawrence, M., O’Connor, M., and Edmundson, B. (2000). A field study of sales forecasting accuracy and processes. European Journal of Operational Research, 122(1):151–160.
  • Liang and Cao (2018) Liang, T. X. and Cao, C. X. (2018). Outliers detect methods for time series data. Journal of Discrete Mathematical Sciences and Cryptography, 21(4):927–936.
  • MacQueen (1967) MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., volume 1, pages 281–297. University of California Press.
  • Morales and Wang (2010) Morales, D. R. and Wang, J. (2010). Forecasting cancellation rates for services booking revenue management using data mining. European Journal of Operational Research, 202(2010):554–562.
  • Mukhopadhyay et al. (2007) Mukhopadhyay, S., Samaddar, S., and Colville, G. (2007). Improving revenue management decision making for airlines by evaluating analyst-adjusted passenger demand forecasts. Decision Sciences, 38(2):309–327.
  • O’Connor et al. (1993) O’Connor, M., Remus, W., and Griggs, K. (1993). Judgemental forecasting in times of change. International Journal of Forecasting, 9(2):163–172.
  • Pereira (2016) Pereira, L. N. (2016). An introduction to helpful forecasting methods for hotel revenue management. International Journal of Hospitality Management, 58:13–23.
  • Petropoulos et al. (2014) Petropoulos, F., Makridakis, S., Assimakopoulos, V., and Nikolopoulos, K. (2014). ‘horses for courses’ in demand forecasting. European Journal of Operational Research, 237(1):152–163.
  • Pimentel et al. (2014) Pimentel, M. A., Clifton, D. A., Clifton, L., and Tarassenko, L. (2014).

    A review of novelty detection.

    Signal Processing, 99:215–249.
  • Pincus et al. (1995) Pincus, R., Barnett, V., and Lewis, T. (1995). Outliers in Statistical Data. 3rd Edition. Biometrical Journal, 37(2):256.
  • Talagala et al. (2019) Talagala, P. D., Hyndman, R. J., Smith-Miles, K., Kandanaarachchi, S., and Muñoz, M. A. (2019). Anomaly Detection in Streaming Nonstationary Temporal Data. Journal of Computational and Graphical Statistics.
  • Talluri and Van Ryzin (2004) Talluri, K. T. and Van Ryzin, G. J. (2004). The Theory and Practice of Revenue Management. Kluwer Academic Publishers.
  • Temath et al. (2010) Temath, C., Pölt, S., and Suhl, L. (2010). On the robustness of the network-based revenue opportunity model. Journal of Revenue and Pricing Management, 4(9):341–355.
  • Tharwat (2018) Tharwat, A. (2018). Classification assessment methods. Applied Computing and Informatics, pages 1–13. in press.
  • Tsay (2002) Tsay, R. (2002). Analysis of Financial Time Series. John Wiley and Sons.
  • Weatherford and Belobaba (2002) Weatherford, L. R. and Belobaba, P. P. (2002). Revenue impacts of fare input and demand forecast accuracy in airline yield management. The Journal of the Operational Research Society, 53(8):811–821.
  • Weatherford et al. (1993) Weatherford, L. R., Bodily, S. E., and Pfeifer, P. E. (1993). Modeling the Customer Arrival Process and Comparing Decision Rules in Perishable Asset Revenue Management Situations. Transportation Science, 27(3):239–251.
  • Weatherford and Kimes (2003) Weatherford, L. R. and Kimes, S. E. (2003). A comparison of forecasting methods for hotel revenue management. International Journal of Forecasting, 19(3):401–415.
  • Weatherford and Pölt (2002) Weatherford, L. R. and Pölt, S. (2002). Better unconstraining of airline demand data in revenue management systems for improved forecast accuracy and greater revenues. Journal of Revenue and Pricing Management, 1(3):234–254.
  • Wilks (1941) Wilks, S. S. (1941). Determination of Sample Sizes for Setting Tolerance Limits. The Annals of Mathematical Statistics, 12(1):91–96.
  • Zeni (2003) Zeni, R. H. (2003). The value of analyst interaction with revenue management systems. Journal of Revenue and Pricing Management, 2(1):37–46.

Appendix A Technical Description of Methodologies

a.1 Outlier Detection Approaches

Let be the number of booking curves. We observe the cumulative number of bookings for each booking curve at time points over a booking horizon of length : . Note that do not necessarily need to be equally spaced. Then is a time series of bookings for curve , up to time : .

Nonparametric Percentiles

Let be the cumulative number of bookings for curves at time . Find the lower and upper (2.5% and 97.5%) percentiles of the ordered sample, and . For any booking curve , if the number of bookings at time , is less than or greater than , it is defined as an outlier at time . Note that an alternative (parametric) approach would be to fit a distribution to the data and use the lower and upper percentiles of the fitted distribution.

Tolerance Intervals

For , a random sample from a population with distribution , if:


then the interval is called a two-sided tolerance interval (Hahn and Chandra, 1981). At each booking interval, a tolerance interval for the number of bookings until that point in time, can be defined. If the number of bookings lies outside of this tolerance interval, the booking curve can be deemed an outlier.

  • Nonparametric Tolerance Intervals: Let be the ordered observations of the sample . Wilks (1941) details that a tolerance interval can be calculated as follows:

    1. Let , then let be the smallest integer such that:

    2. Letting , where , then is a tolerance interval, for any such and . It is common to choose:


      then i.e. .

  • Parametric Tolerance Intervals: Given the discrete, count nature of the data, an obvious first choice for the number of bookings at time , is a Poisson distribution. Supposing

    is the observed value of a random variable

    which has a Poisson distribution, , a tolerance interval based on is constructed in two steps, as described by Hahn and Chandra (1981):

    1. Calculate a two-sided

      -level confidence interval,

      for , such as:

    2. Find the minimum number , and the maximum number such that:


    Given the desire for a general method, the presence of differing mean-variance relationships between fare classes and over time, suggests that assuming a Poisson distribution may not be appropriate, given the fixed (equal) mean-variance relationship of this distribution. Alternative distributions which could be tested include the Negative Binomial, which has two parameters for mean and variance (although only allows the variance to be larger than the mean), or the Generalised Poisson distribution, which has an additional parameter allowing the variance to change.

Robust Z-Score

Let be the cumulative number of bookings for flight at time . The robust Z-score can be calculated as (Iglewicz and Hoaglin, 1993):


where is the median number of bookings at time across all booking curves, and the Median Absolute Deviation at time , , is given by:


A booking curve, , can be classified as an outlier at time , if the number of bookings at time , , has a modified Z-score with magnitude above 3.5, as described by Iglewicz and Hoaglin (1993).

Distance-based Approaches

Given that a time series of length can be thought of as a point in a -dimensional space, the distance between two time series can be calculated and used as a measure of the difference between them. In particular, for a time series , we define:


where is the distance between two booking curves, and , up to time , and is the total number of booking curves being considered. Here the distance-based outlier score is given as the average distance of a point to its -nearest neighbours, and we set , all other points. Hence, for some given threshold, all booking curves whose mean distance is larger than the threshold can be marked as an outlier. Booking curve can be defined as an outlier, at time , if:


where is the mean of the mean distances, and the standard deviation. We consider both Euclidean and Manhattan distance metrics:

  • Euclidean:

  • Manhattan:

Figure 7: Within cluster sum of squares for choosing .

-Means Clustering

It should be noted that clustering algorithms, such as -means clustering, are optimised to determine clusters instead of outliers meaning that the success of the outlier detection relies on an algorithm’s ability to accurately determine the structure of the clusters. The distance threshold at which a point is classified as an outlier also needs to be specified. Deb and Dey (2017) describe a global threshold distance, at which point those observations which are further away from their cluster centre are classed as outliers, as being half the sum of the maximum and minimum distances. -means clustering also relies on specifying the number of clusters in advance. The optimal number of clusters should seek to minimise the within cluster sum of squares without overfitting. Choosing is a difficult problem as it requires fitting -means with multiple values of and choosing the best one. Figure 7 demonstrates the within cluster sum of squares for multiple values of , where the optimal number of clusters is chosen as the elbow of the plot, .

Multivariate Functional Halfspace Depth

The general procedure for detecting outliers at time using functional depth, as described by Febrero et al. (2008) and Hubert et al. (2015), is as follows:

  1. Define to be the functional depth of the , booking curve at time .

  2. Define a threshold, , for the functional depth.

  3. Those booking curves with functional depths, , below the threshold are classified as outliers, delete them from the sample.

  4. Recalculate functional depths on the new sample, and remove further outliers. Repeat until no more outliers are found.

As described by Febrero et al. (2008), the threshold, , is ideally chosen such that:


when there are no genuine outliers present in the sample. However, this would require knowing the distribution of functional depths when there are no outliers. Febrero et al. (2008) discuss two bootstrapping-based procedures for estimating . The general idea of the bootstrapping method used in this paper, as described by Febrero et al. (2008), is to (i) resample the booking curves, with probability proportional to their functional depths (such that any outlying curves are less likely to be resampled), (ii) smooth the bootstrap samples, then (iii) as the median value of the 1% percentiles of the empirical distributions of the depths of the bootstrapped samples. For full details, see Febrero et al. (2008). In this paper, we restrict our attention to halfspace depth. In the case of one-dimensional random variables, the halfspace depth of a point with respect to a sample drawn from distribution is:


where is the empirical cumulative distribution of the sample (Febrero et al., 2008). This definition has been extended to the functional data setting, see Hubert et al. (2012) and Claeskens et al. (2014). Let be booking curve up to time , where , and each is a

-variate vector. In the functional setting, the multivariate functional halfspace depth of a curve

is given by:


where, using , the weights, , are, according to Hubert et al. (2012):


and the sample halfspace depth of a -variate vector at time is given by (Hubert et al., 2012):


In this paper, we are considering a univariate, , functional halfspace depth since we choose to monitor booking curves only. However, the definition of a multivariate functional halfspace depth opens up the possibility of jointly monitoring booking curves and revenue curves, for example. As described by Hubert et al. (2012), computing the multivariate functional halfspace depth can be done with fast algorithms, and in this paper we use the R-package mrfDepth to do so.

a.2 Univariate Forecasting Techniques for Extrapolation

Although an important element of a revenue management system is forecasting, there are multiple reasons why we create new forecasts to extrapolate rather than using the existing ones generated by the RM system. Three particular reasons are (i) depending on the optimisation routine used to set booking limits, forecasts of how demand builds up over time may not have been calculated. Some methods only require forecasts of final demand, and so the type of forecasts we wish to use for extrapolation may not exist. (ii) In the event that forecasts of how demand builds up over time do exist, historical forecasts may not be stored. In terms of identifying critical booking curves in historical data, this also means the forecasts used for extrapolation are not available. (iii) Forecasts for how demand accumulates over time are typically based on data from similar historical booking curves. The use of data from other booking curves to extrapolate has the potential to mask outliers by normalising behaviour. Hence, at each time point we wish to create a forecast based solely on the data for an individual booking curve, with the goal not being to accurately predict demand, but rather to amplify the differences between booking patterns.

a.3 Simple Exponential Smoothing (SES)

SES works on the principle of averaging whilst down-weighting older observations. Further details can be found in Chatfield (1975). Given a time series , a forecast for time , is given by:


for some smoothing constant, . Note that this results in a constant forecast for the bookings from time .

a.4 Autoregressive Integrated Moving Average (ARIMA)

ARIMA models incorporate a trend component, and assume that future observations are an additive, weighted combination of previous observations and previous errors. Let be the differenced time series relating to . See Box and Jenkins (1970) for an overview of differencing procedures, and Chatfield (1975) for a description of ARIMA processes. The one-step ahead forecast is given by:


for some constant mean , parameters

and white noise process

. We use AIC and Dickey-Fuller tests, in combination with visual inspection, to select the orders , , and . See Box and Jenkins (1970), and the R package forecast.

a.5 Integrated Generalised Autoregressive Conditional Heteroskedasticity (IGARCH)

IGARCH models incorporate a trend component and assume that the variance structure follows an autoregressive moving average model. Again, let be the differenced time series relating to . See Tsay (2002) for further details on IGARCH processes. IGARCH(1,d,1) models assume the following structure:


We assume that the order of the IGARCH model is to reduce computational time.

Appendix B Details of Simulation-based Framework

b.1 Forecasts for Inventory Controls

Fare Class Fare (€) = 0.9 = 1.2 = 1.5
1 A 400 31.9 23.0 46.2 25.3 52.7 32.2
2 O 300 17.5 14.2 24.2 18.8 28.3 30.5
3 J 280 20.0 14.2 28.6 25.5 33.6 31.8
4 P 240 16.8 16.1 22.9 26.6 26.1 23.8
5 R 200 13.4 11.5 18.5 16.5 21.6 18.8
6 S 185 12.3 14.3 16.9 11.2 21.0 21.1
7 M 175 52.6 19.2 69.8 28.2 81.8 33.8
Table 4: Forecasts of mean and variance of demand for each fare class

In terms of choosing the number of replications of the simulation,

, to use in the calculations of the forecasts, we consider the standard errors of the estimates. The standard error of the mean is given by:


such that it is typically in the range of 0.3 - 0.6 when . The standard error of the variance is given by:


and is typically in the range of 2 - 5 when . Therefore the number of simulations provides reasonable estimates of the demand mean and variance forecasts for each fare class.

b.2 Optimisation Heuristics to Compute Inventory Controls

Expected Marginal Seat Revenue-b (EMSRb)

It is assumed that demand for each fare class, , is independent and normally distributed:


where and are forecasted as described above. The protection level for fare class is given by (Belobaba, 1992):



is the (Gaussian) distribution of demand for fare class

, and is the fare in fare class . is the weighted-average revenue from classes :


Note that the protection level for all fare classes, , is simply equal to the capacity, . As stated by Talluri and Van Ryzin (2004), Equation (37) becomes:


where is the mean, and is the variance, of the aggregated demand. Hence, the booking limit for class is given by the capacity minus the protection level for classes and higher:


Expected Marginal Seat Revenue-b with Marginal Revenue Transformation (EMSRb-MR)

The following marginal revenue transformation, described by Fiig et al. (2010), assumes that customers only buy the lowest available fare, even if they would be willing to pay more. In this setting, letting be the lowest available fare product, the demand for all other fare products becomes zero:


Therefore the adjusted demand for fare class becomes:


The adjusted fares are given by:


An alternative method of calculating adjusted fares without explicitly forecasting demand for each fare class is to assume that:


the demand for a particular fare class is the baseline demand for the lowest fare class, , multiplied by a sell-up probability, . In practice, these sell-up probabilities can be forecasted instead of the fare class demand assuming an independent model. In our case, due to comparing EMSRb with EMSRb-MR, we have the fare class forecasts already. The two methods are equivalent. The booking controls under EMSRb and EMSRb-MR are shown in Table 5, where the demand factor, , is defined as the ratio of demand, , to capacity, .

Fare Class = 0.9 = 1.2 = 1.5
A 200 200 200 200 200 200
O 171 165 157 151 151 144
J 155 155 134 134 125 125
P 134 125 105 95 90 79
R 117 109 81 72 62 52
S 104 109 62 72 39 52
M 91 96 45 51 18 24
Table 5: Booking limits under EMSRb and EMSRb-MR



Magnitude of Outliers


Frequency of Outliers

Nonparametric Percentiles

Tolerance Intervals

Poisson Tolerance Intervals




-Means Clustering (Euclidean)

-Means Clustering


Functional Depth

Functional Depth

Functional Depth

EMSRb -25% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
-12.5% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
+12.5% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
+25% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
EMSRb-MR -25% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
-12.5% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
+12.5% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
+25% 1%    ✓    ✓    ✓    ✓
5%    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓    ✓
10%    ✓    ✓    ✓    ✓
Table 6: Experimental Simulation Study

Appendix C Additional Results

c.1 The Effect of EMSRb vs EMSRb-MR on Outlier Detection Performance

Figure 8: EMSRb vs. EMSRb-MR optimisation under functional depth with ARIMA extrapolation
Optimisation Heuristic Booking Intervals before Departure
25 20 15 10 5 0
EMSRb 0.57 0.67 0.94 0.91 0.93 0.93
EMSRb-MR 0.55 0.64 0.94 0.92 0.94 0.93
Table 7: Balanced Classification Rate under different optimisation heuristics averaged over different magnitudes of demand outliers

c.2 Foresight Comparison

Method Booking Intervals before Departure
25 20 15 10 5 0
Poisson Tolerance Intervals 0.55 0.54 0.53 0.57 0.58 0.57
K-Means (Euclidean) 0.53 0.56 0.59 0.61 0.62 0.63
Functional Depth 0.55 0.60 0.73 0.88 0.93 0.93
Functional Depth + ARIMA 0.55 0.68 0.94 0.91 0.93 0.93
Functional Depth + GARCH 0.57 0.63 0.87 0.93 0.94 0.93
Functional Depth + SES 0.57 0.73 0.83 0.91 0.93 0.93
Table 8: Comparison of foresight outlier detection averaged over different magnitudes of demand outliers with 5% outlier frequency

c.3 Effects of Different Magnitudes and Frequencies of Outliers

Method Magnitude of Outliers Booking Intervals before Departure
25 20 15 10 5 0
-25% 0.58 0.63 0.59 0.66 0.73 0.73
Poisson Tolerance -12.5% 0.51 0.52 0.51 0.53 0.55 0.55
Intervals +12.5% 0.53 0.50 0.52 0.51 0.51 0.50
+25% 0.60 0.52 0.55 0.58 0.57 0.50
-25% 0.60 0.68 0.94 0.92 0.94 0.93
Functional Depth -12.5% 0.55 0.69 0.93 0.93 0.94 0.93
+ ARIMA +12.5% 0.50 0.68 0.94 0.94 0.93 0.93
+25% 0.54 0.68 0.94 0.92 0.93 0.93
Table 9: Balanced Classification Rate under different magnitudes of outliers with 5% outlier frequency
Figure 9: Balanced Classification Rate under different frequencies of outliers averaged over different magnitudes of demand outliers.
Method Frequency of Outliers Booking Intervals before Departure
25 20 15 10 5 0
Functional 1% 0.56 0.67 0.92 0.89 0.93 0.93
Depth 5% 0.55 0.68 0.94 0.91 0.93 0.93
+ ARIMA 10% 0.57 0.68 0.92 0.89 0.91 0.92
Table 10: Balanced Classification Rate under different frequencies of outliers averaged over different magnitudes of demand outliers.

c.4 Revenue Improvement Under Outlier Detection-based Analyst Alerts

Method Magnitude of Outliers Booking Intervals before Departure
25 20 15 10 5 0
-25% 7.36% 7.60% 5.49% 3.81% 1.80% -
Functional Depth -12.5% 6.08% 6.42% 2.55% 1.16% 0.12% -
+ ARIMA +12.5% 1.86% 0.88% 1.03% 0.81% 1.04% -
+25% 1.15% 4.99% 4.64% 4.07% 4.45% -
Table 11: Gain in revenue under different magnitudes of outliers using functional depth with ARIMA extrapolation.

c.5 Comparison of Methods for Hindsight Detection

Figure 10: Comparison of hindsight outlier detection under different magnitudes of demand outliers with 5% outlier frequency

For hindsight detection performance, we rely on the BCR averaged across all booking intervals. As shown in Figure 10, hindsight detection performance typically increases as the complexity of the outlier detection method increases across all categories of outliers tested. These results are consistent with those for foresight detection. Appendix Figure 10 shows that including the extrapolation step induces only a small improvement in hindsight detection performance. However, outliers are detected early in the horizon, meaning any actions taken as a result of their identification will have a significant positive impact in terms of revenue overall, both within and beyond the booking horizon. Within the revenue management process, identifying outliers and adjusting controls as early as possible provides the most benefit. Nevertheless, even detecting outliers in hindsight promises some advantages over not identifying them at all.

Method Magnitude of Outliers
-25% -12.5% +12.5% +25% Mean
Poisson Tolerance Intervals 0.74 0.55 0.52 0.62 0.61
K-Means Clustering (Euclidean) 0.67 0.57 0.58 0.69 0.63
Functional Depth 0.93 0.93 0.93 0.93 0.93
Functional Depth + ARIMA 0.94 0.94 0.94 0.94 0.94
Table 12: Comparison of hindsight outlier detection under different magnitudes of demand outliers with 5% outlier frequency.