In recent years, data-driven sequential decision-making has received a lot of attentions and finds a wide range of applications in operations management, such as dynamic inventory control (see, e.g., Huh et al. (2011), Chen and Plambeck (2008), Chen et al. (2019b, a), Lei et al. (2019)), dynamic pricing (see, e.g., Besbes and Zeevi (2009, 2015), Wang et al. (2014), Chen et al. (2019c), Broder and Rusmevichientong (2012)), dynamic assortment optimization (see, e.g., Rusmevichientong and Topaloglu (2012), Saure and Zeevi (2013), Agrawal et al. (2019), Wang et al. (2018), Chen et al. (2018)). Take the personalized/contextual dynamic pricing as an example; it is usually assumed that the underlying demand, which is a function of the price and customer’s contextual information, follows a certain probabilistic model with unknown parameters. Over a finite time selling horizon of length
, at each time period, one customer arrives. The seller observes the characteristic of the customer and makes the price decision. Then the arriving customer makes the purchase decision based on the posted price. The seller will observe the purchase decision, update her knowledge about the demand model, and might change the price policy accordingly for future customers. The key challenge in dynamic pricing is to accurately estimate the underlying model parameter in demand function, which will then be used to determine prices later on. Existing literature on dynamic pricing only constructs a point estimator of the underlying model parameter, i.e., estimating the parameter by a single number or a vector, without quantifying the uncertainty in the estimator. Uncertainty quantification is very useful for practitioners. It is highly desirable for the seller to obtain confidence intervals of the underlying demand function, which is guaranteed to cover the true demand function withprobability (also known as the confidence level, e.g., ).
Although construction of confidence interval has been a classical topic in statistics (Stigler 2002), the existing results in statistical literature mainly deal with independent and non-adaptive data. The behavior of sequentially collected data is quite different from independent data. In particular, in the (contextual) dynamic pricing problem both the decision (e.g., the price) and the collected customers’ contextual information at each time period are adaptive, which heavily correlate with information obtained in previous periods. Due to the sequential dependence, estimators computed from adaptively collected data might have severe distributional bias even when the sample size goes to infinity (Deshpande et al. 2018, 2019). Such a bias makes the classical approach of constructing confidence intervals (e.g., Wald’s test, see Chapter 17 of Keener (2010)) no longer valid.
The main goal of our paper is to construct a debiased estimator that is asymptotically normal centered at the true model parameter with a simple covariance matrix structure. Based on the proposed debiased estimator, we construct both point-wise confidence intervals (i.e., confidence intervals valid for any given decision variable (price) and contextual information) and uniform confidence intervals (i.e., confidence intervals uniformly valid for all decision variables and contextual information). To highlight our main idea, we will consider the problem of constructing confidence intervals for demand function in dynamic pricing, which is one of the most important data-driven sequential decision problems in revenue management.
In particular, we study a stylized personalized dynamic pricing model in which there are selling periods. At each selling period , a potential customer comes with an observable personal context vector . Instead of assuming are independent across time periods as in existing literature (e.g., Chen et al. (2015), Miao et al. (2019)), we allow to depend on information from previous selling periods. This is a more practical scenario since a customer’s contextual information might be heavily correlated with previous prices and realized demands. For example, a consecutive time periods of posted lower price or higher demands will attract new customers from a different population, whose contextual information will be different from the previous customers. By observing the contextual information of the arriving customer, the seller decides the price and the customer decides on a realized demand. We assume the demand of the arriving customer follows a general probabilistic model,
where is a parametric function parameterized by with a known form (e.g., linear or logistic), is an unknown parameter vector that models the demand behaviors, and are zero-mean, conditionally independent (conditioning on and ) noise variables. A typical objective of the retailer is to maximize his/her expected revenue, or more specifically
without knowing the model a priori. In this paper, our goal is to construct confidence intervals for both the true model parameter and the underlying demand function (see the definition in Sec. 1.1).
The demand model in Eq. (1) is very general and covers two widely used demand models: the linear model and the logistic model. In the linear model, is modeled as
where is a known feature map for the price and contextual information, and
are noise variables. In the logistic regression model,is a binary demand realized according to the logistic model
In contextual dynamic pricing models, two dependency or feedback structures are essential to model the pricing dynamics in practice. The first feedback structure is that the retailer, after observing a sequence of customers’ purchasing activities, could leverage his/her knowledge or estimates of the unknown model to offer more profitable pricing decisions. In other words, the prices sequentially decided by the retailer are statistically correlated with the purchasing activities of prior customers. The second feedback structure involves the types (reflected in context vectors ) of customers arriving, which could well depend on the historical prices (e.g., a consistent high price offering might attract more affluent customers) and the realized demands in previous selling periods. Hence, the context vectors are statistically correlated with the prices and demands in previous time periods.
Now we rigorously formulate the above-mentioned feedback structures. A contextual dynamic pricing model can be written as , where is the time horizon, is the unknown regression model, is the feature map, is the price range, and characterizes the underlying context generation procedure, such that , where
is a certain random variable. A contextual dynamic pricing algorithm/strategy overtime periods can be written as , where is a function mapping from the history of prior selling periods to the offered price for incoming customer at time . here is another random variable. The functions and capture the two feedback structures mentioned in the previous paragraph, where both and are statistically correlated with in prior selling periods .
1.1 Our contribution: uncertainty quantification in sequentially collected data
The main objective of this paper is to quantify the uncertainty for the learned demand function from purchase data on dynamically, adaptively chosen prices and contexts. Namely, we will construct two types of confidence intervals of the underlying demand function , point-wise confidence intervals and uniform confidence intervals, which are introduced as follows.
For a pre-specified confidence level at the end of time periods, where is usually a small constant such as 0.1 or 0.05, our goal is to construct upper and lower confidence interval edges , , such that for any given price , context , and ,
The confidence interval in (4) is known as the point-wise confidence interval since it holds for a fixed price and context vector .
In many applications, we are also interested in confidence intervals with uniform coverage. More specifically, for a pre-determined confidence level , , satisfy for all that
where is a certain compact subset of as the domain of all context vectors.
To construct these confidence intervals, we also provide the confidence interval of the model true parameter , which might have its own independent interest in practice.
As we mentioned, the main difficulty in constructing these confidence intervals lies in the two dependency structures of the price and contexts. Therefore, in contrast to the non-adaptive case where the maximum likelihood estimator (MLE) is unbiased, the MLE based on the adaptive data will have a significant distributional bias. In the next subsection, we briefly discuss two popular contextual dynamic pricing algorithms in the literature to better illustrate the adaptive data collection process. We also explain in Sec. 3 why the classical construction of confidence intervals fails in our problem.
1.2 Online policies for contextual dynamic pricing
We mention two popular online policies for the contextual dynamic pricing problem.
The -greedy policy.
An -greedy policy (Watkins 1989) has a parameter to balance the tradeoff between exploration and exploitation. At each selling period , with probability , a price is selected uniformly at random for exploration. With probability , the exploitation price is set based on the current estimate :
which is the regularized empirical-risk minimization (ERM) using sales data from prior selling episodes. Here is a certain risk function depending on the particular class of the underlying demand model . For example, for the linear demand model, the least-squares function is commonly used:
For the logistic demand model, the negative log-likelihood function is often adopted,
A common choice of would be the negative log-likelihood function. In principle, the risk function should be selected such that the underlying true model minimizes the function in expectation. Detailed assumptions on will be given in Sec. 2.
The Upper-Confidence Bound (UCB) policy.
In the UCB policy (or more specifically the LinUCB policy for linear or generalized linear contextual bandits (Rusmevichientong and Tsitsiklis 2010, Filippi et al. 2010, Abbasi-Yadkori et al. 2012)), a regularized MLE is calculated for every selling period in (6). Afterwards, an offered price is selected to maximize an upper bound of the demand function , or more specifically
where is a certain form of confidence bound such that with high probability for all and , where is the underlying true model parameter. We refer the readers to the works of Abbasi-Yadkori et al. (2012), Rusmevichientong and Tsitsiklis (2010), Filippi et al. (2010) for the different variants of forms in linear and generalized linear contextual bandits.
While the UCB policy naturally constructs “upper confidence bounds”, such constructed confidence bounds are inadequate for the use of predicting reasonable demand ranges because the upper confidence bound gives too wide intervals to be useful. In fact, confidence bounds in UCB are constructed using concentration inequalities, in which the constants are far from tight. Given the pre-specified confidence level , our goal is to construct demand confidence intervals that have statistically accurate coverage as defined in (4) and (5), allowing potential users to understand exactly the range of expected demands at certain confidence levels.
1.3 Related works
Data-driven sequential decision-making has been extensively studied for revenue and inventory management problems with unknown or changing environments. In most existing literature, effective online policies are developed to maximize revenues. However, how to provide accurate confidence intervals for the key underlying probabilistic model parameters (e.g., demand function or utility parameters) have not been well-explored in the literature. Recently, the work of Ban (2020) considered the construction of confidence intervals (for the demand functions) in an inventory control model. Compared to approaches proposed in this paper, the work of Ban (2020) derives asymptotic normality of certain SAA strategies, while our approach de-biases general empirical-risk minimizers so that the constructed confidence intervals are applicable to a wide range of online policies, such as
-greedy, upper confidence bounds or Thompson sampling. Technically, the limiting distributions inBan (2020) were established using Stein’s methods, while our proposed approach is inspired by the one-step estimators in asymptotic statistics (Van der Vaart 2000).
Recently, the de-biased estimator has been extensively investigated in high-dimensional penalized estimators (Van de Geer et al. 2014, Zhang and Zhang 2014, Javanmard and Montanari 2014, Wang et al. 2019) since the regularization (e.g., -penalty in Lasso (Tibshirani 1996)) leads to the bias in the estimator. However, these works only deal with non-adaptively collected data and thus cannot be applied to our setting. The recent works of Deshpande et al. (2018, 2019) applied the de-biasing approach to confidence intervals of adaptively collected data, including multi-armed and linear contextual bandit problems. While the works of Deshpande et al. (2018, 2019) mainly focus on linear models, this paper provides confidence intervals for general parametric models
. The extension to general parametric model classes poses some unique technical challenges, such as the sequential estimation of Fisher’s information matrix. Further details are given in our Sec.4.
1.4 Notations and paper organization
Throughout this paper we adopt the following asymptotic notations. For sequences and , we write or if ; we write or if .
The rest of the paper is organized as follows: in Sec. 2 we list the assumptions made in this paper, including discussion on why the imposed assumptions are useful and relevant; in Sec. 3 we review the classical approach of Wald’s intervals for constructing confidence intervals, and explain why such a classical approach fails in contextual dynamic pricing problems; in Sec. 4 we propose the de-biased approach and demonstrate, through both theoretical and empirical analysis, that our proposed confidence intervals are accurate in dynamic pricing. Finally, in Sec. 5 we conclude the paper by mentioning several future directions for research. Proofs of some technical lemmas are deferred to the supplementary material.
2 Models and Assumptions
In this section we state assumptions that will be imposed throughout of this paper. Most of the assumptions are standard in the literature of dynamic pricing or contextual bandits. There are however a few additional assumptions for the specific purposes of building accurate confidence intervals, which are often made in statistical literature.
2.1 Assumptions on the demand model
We first list assumptions on the underlying demand function (i.e., the mean of the demand), as well as assumptions on the underlying true parameter .
For , and for some compact , and for some known compact parameter class ;
The demand function is continuously differentiable with respect to , and furthermore for all and ;
Assumptions (A1) and (A2) assert that both the context vectors and the unknown model parameter are bounded, and furthermore the known demand function satisfies basic smoothness properties. This assumption implies that the expected demands are bounded and cannot be arbitrarily large. The two examples
(linear regression model) and(logistic regression model) satisfy both conditions, provided that the feature map is bounded.
2.2 Assumptions on the noise variables
Recall that the noise variable is defined as
which is the difference between the realized demand and its (conditional) expectation. We list assumptions on the noise variables across the selling periods.
are independent, centered and bounded sub-Gaussian random variables;
There exists a known variance functionsuch that
for all , and Lipschitz continuous with respect to ; .
In the above assumptions, (B1) is a standard assumption that the noise variables are all centered and sub-Gaussian with light tails, conditioned on the offered price and the context vector . (B2) imposes further assumptions on the variance of the noise variables. In particular, it assumes that the conditional variance of (conditioned on and ) is bounded, never zero, and smooth. Such an assumption is useful in demand models
which are inherently heteroscedastic. For example, in the logistic demand model whereis a Bernoulli variable with , it is easy to verify that , and all conditions in Assumption (B2) hold true.
2.3 Assumptions on the risk function
The empirical risk minimization problem in Eq. (6) is the workhorse of our model estimates . As discussed, popular risk functions
include the least-squares loss functionand the negative log-likelihood function . Below we give a list of assumptions imposed on the risk function so that the ERM estimates satisfy desired properties.
The risk function is three times continuously differentiable with respect to , and furthermore for all and ;
For all , ;
Here in Assumption (C1), is a symmetric tensor, and its operator norm is defined as . For the linear demand model and least-squares losses , Assumption (C1) is implied by the boundedness of ; for other parametric models (e.g., the logistic regression model) and the negative log-likelihood loss , Assumption (C1) are standard conditions used in the analysis of maximum likelihood estimator. Finally, Assumption (C2) means that the true model parameter is a stationary point of the loss function , which is satisfied by both the least-squares loss function and the negative log-likelihood loss function. In statistical literature, is known as (the negative of) the score function, whose expectation is zero under .
2.4 Assumptions on the contextual pricing model
At last, we state an assumption on the behavior of the contexts under the contextual pricing model .
There exists a positive constant such that, for any selling period and filtration , it holds that and for all and , which could potentially depend on .
Assumption (D1) concerns two quantities: the (expected) outer product of demand gradients , which by definition is always positive semi-definite, and the (expected) Hessian of the loss function , which can theoretically be any symmetric matrix but is in general positive semi-definite for common loss functions like the least squares or negative log-likelihoods. Assumption (D1) then assumes, essentially, that both quantities and
are positive definite in a “strict” sense, by lower bounding the least eigenvalues of bothand by a positive constant . Since both expectations are conditioned upon the adaptively chosen prices and context vectors , in Assumption (D1) we assume that the lower bound on the smallest eigenvalues holds for any such chosen prices/contexts in prior selling periods. Finally, we remark that the exact value of does not need to be known, as it is only used in the theoretical analysis of the validity of confidence intervals constructed by our proposed algorithm.
3 Limitation of Classical Wald’s Intervals
In classical parametric statistics with i.i.d. data points, the Wald’s interval is a standard approach towards building asymptotic estimation or confidence intervals on maximum likelihood estimates. In this section, we review the approach of Wald’s interval in the context of contextual dynamic pricing, and discuss why such a classical method cannot be directly applied because of the feedback structures presented in our problem.
Suppose after selling periods the offered prices, purchase activities and customers’ context vectors are . Let be the maximum likelihood estimate
where is the sample Fisher’s information matrix. With Eq. (11), using the Delta’s method 111The delta’s method asserts that if then . See for example the reference of Van der Vaart (2000). we have for fixed that
A confidence interval on can then be constructed as
where is the
-quantile of a standard normal random variableand denotes the cumulative distribution of function of , i.e., .
While the Wald’s interval is a general-purpose and the most classical approach of constructing confidence intervals, one of the key assumptions made in the construction of the Wald’s interval is the statistical independence among the collected data across selling periods . It is known that, without such independence assumptions, the Wald’s interval could be significantly biased, as in the case of multi-armed bandit predictions (Deshpande et al. 2018) and least-squares estimation in non-mixing time series (Lai and Wei 1982).
To better illustrate the failure of Wald’s test for adaptively collected data, in Figure 1, we plot the empirical distributions of the normalized estimation error and the normalized errors of predicted demands for the Wald’s interval approach. In particular, we consider the simple logistic demand model , with , and . The price range is . The context generating process is designed as , where and . The empirical distributions are obtained with 5000 independent trials, each with selling periods and prices determined by the LinUCB algorithm as described in Sec. 1.2. The top panels in Figure 1 depict the distributions of two coordinates of , and the bottom panels are normalized demand prediction errors for the cases of ; ; , respectively. One can easily see that, in contextual dynamic pricing the confidence intervals constructed for both the estimation errors and the prediction error (of demands) deviate significantly from the desired limiting distributions (see (11)) and (see (12)), calling for more sophisticated methods to construct accurate confidence intervals.
4 Main Algorithm and Analysis
The pseudo-code of our proposed algorithm for constructing confidence intervals of the demand function is given in Algorithm 1.
At a high level, the objective of Algorithm 1 is to construct accurate confidence intervals in both the “point-wise” sense (i.e., confidence intervals for the expected demand in (4) for a single price and context ) and the “uniform” sense (i.e., confidence intervals in (5) for that hold uniformly over all possible prices and contexts). The input to Algorithm 1 is the historical price, context, and demand data over selling periods, during which an adaptive dynamic pricing strategy is used. The adaptivity of the pricing strategy means that the demands and prices are highly correlated, and therefore the basic Wald’s intervals cannot be directly applied, as discussed in the previous section.
The key idea behind our proposed approach is the idea of “de-biasing” the empirical risk estimate (also termed as the “pilot” estimate in Algorithm 1). More specifically, built upon the biased pilot estimate , we construct a “whitening” matrix satisfying certain correlation and norm conditions (the procedure of constructing such a whitening matrix is presented in Algorithm 2 and Sec. 4.3), a de-biased estimate is computed by adding the bias-correction term to the ERM estimate , or more specifically
where , . For example, in the linear demand case, , while in the logistic case, . With the bias correction, it can be proved that the bias contained in can be dominated by the main error terms that are asymptotically normal, as shown in Theorem 4.1 later. With the asymptotic normality of , both point-wise and uniform confidence intervals can be constructed using either the Delta’s method in Eq. (12) or Monte-Carlo methods, as shown in Steps 1 and 15 of Algorithm 1.
In the rest of this section we provide a rigorous analysis of the proposed confidence intervals in Algorithm 1. In Sec. 4.1, we perform a bias-variance decomposition analysis of the debiased estimate and prove in Theorem 4.1 that, under certain conditions,
is asymptotically normally distributed; In Sec.4.2 we upper bound the estimation error of the pilot estimate , and in Sec. 4.3 we propose a procedure of constructing the whitening matrix such that the conditions in Theorem 4.1 are satisfied. Finally, in Sec. 4.4 we prove that both point-wise and uniform confidence intervals are asymptotically level-, theoretically establishing the accuracy of constructed intervals.
4.1 Analysis of the de-biased estimator
In Step 1 of Algorithm 1, a de-biased estimate is constructed based on the biased ERM estimate and a certain “whitening matrix” . In this section we analyze the asymptotic distributional properties of based on certain conditions on . The question of how to obtain a whitening matrix satisfying the desired conditions will be discussed in the next section.
For notational simplicity, we denote the gradient at time by and . Also recall the definition of for in (8). The following lemma shows a bias-variance decomposition . The estimation error can be decomposed to , where the bias term satisfies almost surely and the variance , .
Proof of Lemma 4.1. Recall the definition that , where and . Define also . By definition, . Subsequently,
Next, by Taylor expansion and the smoothness of (see Assumption (A2)), we have for every that . Hence, . We then have
which completes the proof. ∎
Our next lemma shows that, when the bias term is sufficiently small, the error
converges in distribution to a multivariate Gaussian distribution. Suppose the following conditions hold:
The non-anticipativity condition: the -th column of , , is measurable conditioned on ;
Let be a diagonal matrix with ;
Then it holds that
We note that the first and second items and the condition on in the third item are all related to the whitening matrix , which will be satisfied according our construction of in Sec. 4.3 (see Lemma 4.3 and Corollary 2). The convergence rate condition on the pilot estimator in the third item will be verified in the next subsection (Sec. 4.2) using tools from self-normalized empirical process.
The key idea behind the proof of Theorem 4.1 mainly involves two steps. The first step is to show that, under the non-anticipativity conditions imposed on , the variance term converges in distribution to a normal distribution using martingale CLT type arguments. The second step shows that the bias term is asymptotically dominated by , and therefore the entire estimation error converges in distribution to a normal distribution. The complete proof is given below.
. The following lemma shows how the characteristic functions ofconverge to the characteristic function of . Let be a fresh sample from the standard -dimensional Gaussian distribution. Define also . Then for any , , it holds that
The proof of Lemma 4.1 is based on standard Fourier-analytic approaches (Billingsley 2008, Lai and Wei 1982, Brown 1971), and is deferred to the supplementary material. Lemma 4.1 shows that the characteristic function of converges point-wise to the characteristic function of , provided that as . By Levy’s continuity theorem, this implies , or more specifically