Multi-Product Dynamic Pricing in High-Dimensions with Heterogenous Price Sensitivity

We consider the problem of multi-product dynamic pricing in a contextual setting for a seller of differentiated products. In this environment, the customers arrive over time and products are described by high-dimensional feature vectors. Each customer chooses a product according to the widely used Multinomial Logit (MNL) choice model and her utility depends on the product features as well as the prices offered. Our model allows for heterogenous price sensitivities for products. The seller a-priori does not know the parameters of the choice model but can learn them through interactions with the customers. The seller's goal is to design a pricing policy that maximizes her cumulative revenue. This model is motivated by online marketplaces such as Airbnb platform and online advertising. We measure the performance of a pricing policy in terms of regret, which is the expected revenue loss with respect to a clairvoyant policy that knows the parameters of the choice model in advance and always sets the revenue-maximizing prices. We propose a pricing policy, named M3P, that achieves a T-period regret of O(√((dT) T)) under heterogenous price sensitivity for products with features dimension of d. We also prove that no policy can achieve worst-case T-regret better than Ω(√(T)).




Perishability of Data: Dynamic Pricing under Varying-Coefficient Models

We consider a firm that sells a large number of products to its customer...

Dynamic Pricing in High-dimensions

We study the pricing problem faced by a firm that sells a large number o...

Fairness-aware Online Price Discrimination with Nonparametric Demand Models

Price discrimination, which refers to the strategy of setting different ...

Dynamic pricing and assortment under a contextual MNL demand

We consider dynamic multi-product pricing and assortment problems under ...

Dynamic Pricing with Demand Covariates

We consider a firm that sells products over T periods without knowing th...

Dynamic Pricing and Demand Learning on a Large Network of Products: A PAC-Bayesian Approach

We consider a seller offering a large network of N products over a time ...

Model Distillation for Revenue Optimization: Interpretable Personalized Pricing

Data-driven pricing strategies are becoming increasingly common, where c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Online marketplaces offer very large number of products described by a large number of features. This contextual information creates differentiation among products and also affects the willingness-to-pay of buyers. To provide more context, let us consider the Airbnb platform: the products sold in this market are “stays.” In booking a stay, the user first selects the destination city, dates of visit, type of place (entire place, 1 bedroom, shared room, etc) and hence narrows down her choice to a so-called consideration set. The platform sets the prices for the products in the consideration set. Notably, the products here are highly differentiable. Each product can be described by a high-dimensional feature vector that encodes its properties, such as space, amenities, walking score, house rules, reviews of previous tenants, etc. We study a model where the platform aims to maximize its revenue.

In setting prices, there is a clear tradeoff. A high price may drive the user away (decreases the likelihood of a sale) and hence hurts the revenue. A low price, on the other hand encourages the user to purchase the product; however, it results in a smaller revenue from that sell. Therefore, in order for the seller to maximize its revenue, it must try to learn the purchase behavior of the users. Using the users’ interactions and purchasing decisions, the seller can learn how users weighs different features in their purchasing decisions.

In this work, we study a setting where the utility from buying a product is a linear function of the product features and its price. Let be the utility obtained from buying a product with feature vector , at price where the parameter vector represents the users’ purchase behavior. Namely, captures the contribution of each feature to the user’s valuations of the products. Similar to [2, 24, 22], we focus on a linear utility model:


where indicates the inner product of two vectors and . The term , a.k.a. market noise, captures the idiosyncratic change in the valuation of each user, and is the price sensitivity parameter. We encode the “no-purchase” option as a new product with zero utility. We emphasize that the parameters of the utility model, and , are a priori unknown to the seller.

In our model, given a consideration set, the customer chooses the products that results in the highest utility. We study the widely use Multinomial Logit (MNL) choice model [27] which corresponds to having the noise terms, Eq (1), drawn independently from a standard Gumbel distribution.

We propose a dynamic pricing policy, called M3P, for Multi-Product Pricing Policy in high-dimensional environments. Our policy uses an

regularized maximum likelihood method to estimate the true parameters of the utility model based on previous purchasing behavior of the users.

We measure the performance of a pricing policy in terms of the regret, which is the difference between the expected revenue obtained by the pricing policy and the revenue gained by a clairvoyant policy that has full information of the parameters of the utility model and always offers the revenue-maximizing price. Our policy, achieves a -regret of , where and respectively denote the features dimension and the length of time horizon. Furthermore, we also prove that our policy is almost optimal in the sense that no policy can achieve worst-case -regret better than .

In the next section, we briefly review the related work to ours. We would like to highlight that our work is distinguished from the previous literature in two major aspects: i) Multi-product pricing that should take into account the interaction of different products as changing the price for one product may shift the demand of other products and this makes the pricing problem even more complex. ii) heterogeneity and uncertainty in price sensitivity parameters. We point out that our methods can obtain logarithmic cumulative regret in if the price sensitivity parameters ( in Eq (1)) were a-priory known, cf., [22].

Related Work

There is a vast literature on dynamic pricing as one of the central problems in revenue management. We refer the reader to [12, 4] for extensive surveys on this area. A popular theme in this area is dynamic pricing with learning where there is uncertainty about the demand function, but information about it can be obtained via interaction with customers. A line of work [3, 16, 20, 8, 17, 10]

took Bayesian approach. Another related line of work assumes parametric models for the demand function with a small number of parameters, and proposes policies to learn these parameters using statistical procedures such as maximum likelihood 

[6, 7, 14, 13, 9] or least square estimation [6, 18, 23].

Recently, there has been an interest in dynamic pricing in contextual setting. The work [1, 11, 24, 22, 5] consider single-product setting where the seller receives a single product at each step to sell (corresponding to in our setting) and assume equal price sensitivities for all products. In [1], the authors consider a noiseless valuation model with strategic buyer and propose a policy with -period regret of order . This setting has been extended to include market noise and also a market of strategic buyers who are utility maximizers [19]. In [11], authors propose a pricing policy based on binary search in high-dimension with adversarial features that achieves regret . The work [22] studies the dynamic pricing in high-dimensional contextual setting with sparsity structure and propose a policy with regret but in a single-product scenario. The problem has been also studied under time-varying coefficient valuation models [21] to address the time-varying purchase behavior of customers and the perishability of sales data. Very recently, [25] studied high-dimensional multi-product pricing, with a low-dimensional linear model for the aggregate demand. In this model, the demand vector for all the products at each step is observed, while in our work the seller only sees the product index that is chosen from the buyer’s consideration set at each step. Similarly, [26] studies a model where the seller can observe the aggregate demand and proposes a myopic policy based on least-square estimations that obtains a logarithmic regret.

2 Model

We consider a firm which sells a set of products to customers that arrive over time. The products are differentiated and each is described by a wide range of features.

At each step , the customer selects a consideration set of size at most from the available products. This is the set the customer will actively consider in her purchase decision. The seller sets the price for each of the products in this set, after which the customer may choose (at most) one of the products in . If he chooses a product, a sale occurs and the seller collects a revenue in the amount of the posted price; otherwise, no sale occurs and seller does not get any revenue.

Each product is represented by an observable vector of features . Products offered at different round can be highly differentiated (their features vary) but we assume that the feature vectors are sampled independently from a fixed, but unknown, distribution .

We assume that for all in the support of , and , for an arbitrarily large but fixed constant . Throughout the paper, we use to indicate the norm.

If an item (at period ) is priced at , then the customer obtains utility from buying it, where111In general the offered price not only depends on the feature vectors but also the period , as the estimate of the model parameters may vary across time . We make this explicit in the notation by considering both and in the subscript.

Here, are the parameters of the demand curve and are unknown a priori to the seller. The term is the product-based utility, and the

component represents market shocks and are modeled as zero mean random variables drawn independently and identically from a standard Gumbel distribution. This noise distribution give us the well-known multinomial logit (MNL) choice model that has been widely used in academic literature and practice 

[27, 15]

. Under the MNL model, the probability of choosing an item

from set is given by


where , for .

We refer to the term in the utility model as the price sensitivity of product . Note that our model allows for heterogeneous price sensitivities. We also encode the no-purchase option by item , with market utility , drawn from zero mean Gumbel distribution. The random utility can be interpreted as the utility obtained from choosing an option outside the offered ones. This is equivalent to . Having the utility model established as above, at all steps the user chooses the item with maximum utility from her consideration set; in case of equal utilities, we break the tie randomly.

To summarize, our setting is as follows. At each period :

  1. The user narrows down her options by forming a consideration set of size at most .

  2. For each product , the seller offers a price .222Equivalently, the seller can determine all the prices in advance and reveal them after the user determines the consideration set. We note that the consideration set of the user does not depend on the prices, but the choice she makes from the consideration set depends on the prices. In addition, recall that all the users share the same and and the choice of consideration set does not reveal information about these parameters.

  3. The user chooses item where .

  4. The seller observes what product is chosen from the consideration set and uses this information to set the future prices.

We make the following assumption that ensures positivity of the products price sensitivity parameters.

Assumption 2.1.

We have , for some constant .

Before proceeding with the policy description, we will discuss the benchmark policy which is used in defining the notion of regret and measuring the performance of pricing policies.

3 Benchmark policy

The seller’s goal is to minimize her regret, which is defined as the expected revenue loss against a clairvoyant policy that knows the utility model parameters in advance and always offers the revenue-maximizing prices. We next characterize the benchmark policy. Let , , where and be the feature matrix, which is obtained by stacking , as its rows (Recall that ). The proposition below gives an implicit formula to write the vector of optimal prices as a function . We refer to as the pricing function.

Proposition 3.1.

The benchmark policy that knows the utility model parameters , sets the optimal prices as follows: For product , the optimal price is given by , where is the unique value of satisfying the following equation:


Proof of Proposition 3.1 is similar to that in [28, Theorem 3.1] and is deferred to Appendix A.1.

We can now formally define the notion of regret. Let be a pricing policy that sets the vector of prices at time for the products in the consideration set . Then, the seller’s expected revenue at period , under such policy will be


with being the probability of buying product from the set as given by Eq (2).333More precisely, is the expected revenue conditional on filtration , where is the sigma algebra generated by feature matrices and market shocks . Similarly, we let be the seller’s expected revenue under the benchmark policy that sets price vectors , at period . The worst-case cumulative regret of policy is defined as

Input: (at time 0) function , regularizations , (bound on )
Input: (arrives over time) covariate matrices
Output: prices
1: , ,
2: for each episode do
3:  Set the length of -th episode:
4:  Update the model parameter estimate using the regularized ML estimator obtained from observation during the previous exploration periods:
where corresponds to “no-purchase” with .
5: Offer prices based on the current estimate as
where is the unique value of satisfying the following equation:
Algorithm 1 M3P policy for multi-product dynamic pricing

4 Multi-Product Pricing Policy (M3P)

In this section, we provide a formal description of our multi-product dynamic pricing policy (M3P). The policy sees the time horizon in an episodic structure, where the length of episodes grow geometrically (episode is of length ). Throughout, we use notation to refer to periods in episode , i.e., . The policy updates its estimate of the model parameters at the beginning of each episode and adhere to that estimate throughout the episode when setting the prices. At each period during one episode, our policy sets the price vectors as , where , are respectively the estimates of and , which are obtained by solving a regularized maximum-likelihood minimization problem using solely the observations (the products sold) in the previous episode. Note that the seller can only observe which products were sold in the previous episode. Formally, the estimate is obtained by minimizing the negative log-likelihood function given by

where denotes the product purchased at time , and


with . We adopt the convention that for the “no-purchase” case with .

The log-likelihood loss can be written in a more compact form. We let be the response vector that indicates which product is purchased at time :

We also let . Then, the log-likelihood loss can be written as


We also add the regularization in the cost function to promote sparsity structure in the estimator


with , for a constant .

The policy terminates at time but note that the policy does not need to know in advance. Further, in our policy exploration and exploitation are mixed. In the beginning of each episode, the policy exploits the observations in the previous episode to update its estimates of the model parameters. Meanwhile, the market shocks in the utilities gives us sufficient amount of exploration and hence we do not need to actively randomize prices to learn the parameters. Also, by the design when the policy does not have much information about the model parameters it updates its estimates frequently (since the length of episodes are small) but as time proceeds the policy gathers more information about the parameters and updates its estimates less frequently, and use them over longer episodes.

5 Regret Analysis for M3P

We next state our result on the regret of M3P policy.

Theorem 5.1.

(Regret upper bound) Consider the choice model (2). Then, the -period regret of the M3P is of , with and being the feature dimension and the length of time horizon. Further, regret of any pricing policy in this case is .

Note that as stated by the theorem, the regret of M3P scales logarithmically in , making the algorithm applicable for high dimensional setting. Below, we state the key lemmas in the proof of Theorem 5.1 and refer to Appendices for the proof of technical steps.

Let be the vector of prices posted at time for products in the consideration set . Recall that M3P sets the prices as , where is the pricing function whose implicit characterization is given by Proposition 3.1.

Our next lemma shows that the pricing function is Lipschitz.

Lemma 5.2.

Suppose that and . Then, there exists a constant such that the following holds


We next upper bound the right-hand side of Eq (12) by bounding the estimation error of the proposed regularized estimator. Denote by the matrix obtained by putting all the feature matrices corresponding to belonging to episode .

Proposition 5.3.

Let be the solution of optimization problem (6), with for a constant . Then, with probability at least , we have


where is a constant depending on and .

The last part of the proof is to relate the regret of the policy at each period to the distance between the posted price vector and the price vector posted by the benchmark. Recall the definition of revenue from (4) and define the regret as .

Lemma 5.4.

Let be the optimal price vector posted by the benchmark policy that knows the model parameters and in advance. There exists a constant (depending on ) such that the following holds,

for some constant .

The reason that in Lemma 5.4, the revenue gap depends on the squared of the difference of the price vectors is that is the optimal price and hence . Therefore, by Taylor expansion of function around , we see that the first order term vanishes and the second order term matters.

The proof of Theorem 5.1 follows by combining Lemma 5.2, Proposition 5.3 and Lemma 5.4. We refer to Appendix B.1 for its proof.

Our next theorem provides a lower bound on the -regret of any pricing policy. The proof of Theorem 5.5 is given in Appendix B.2 and employs the notion of ‘uninformative prices’, introduced by [7].

Theorem 5.5.

(Regret lower bound) Consider the choice model (2). Then, the -period regret of any pricing policy in this case is .

Theorem 5.5 implies that M3P has optimal cumulative regret up to logarithmic factor.


A. J. was partially supported by an Outlier Research in Business (iORB) grant from the USC Marshall School of Business (2018) . This work was supported in part by a Google Faculty Research award (2016).


  • [1] K. Amin, A. Rostamizadeh, and U. Syed. Learning prices for repeated auctions with strategic buyers. In Advances in Neural Information Processing Systems, pages 1169–1177, 2013.
  • [2] K. Amin, A. Rostamizadeh, and U. Syed. Repeated contextual auctions with strategic buyers. In Advances in Neural Information Processing Systems, pages 622–630, 2014.
  • [3] V. F. Araman and R. Caldentey. Dynamic pricing for nonperishable products with demand learning. Operations research, 57(5):1169–1188, 2009.
  • [4] Y. Aviv and G. Vulcano. Dynamic list pricing. In The Oxford handbook of pricing management. 2012.
  • [5] G.-Y. Ban and N. B. Keskin.

    Personalized dynamic pricing with machine learning.

  • [6] O. Besbes and A. Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.
  • [7] J. Broder and P. Rusmevichientong. Dynamic pricing under a general parametric choice model. Operations Research, 60(4):965–980, 2012.
  • [8] N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in second-price auctions. IEEE Transactions on Information Theory, 61(1):549–564, 2015.
  • [9] X. Chen, Z. Owen, C. Pixton, and D. Simchi-Levi. A statistical learning approach to personalization in revenue management. 2015.
  • [10] W. C. Cheung, D. Simchi-Levi, and H. Wang. Dynamic pricing and demand learning with limited price experimentation. Operations Research, 65(6):1722–1731, 2017.
  • [11] M. Cohen, I. Lobel, and R. Paes Leme. Feature-based dynamic pricing. 2016.
  • [12] A. V. den Boer. Dynamic pricing and learning: historical origins, current research, and new directions. Surveys in operations research and management science, 20(1):1–18, 2015.
  • [13] A. V. den Boer and A. P. Zwart. Mean square convergence rates for maximum(quasi) likelihood estimation. Stochastic systems, 4:1 – 29, 2014.
  • [14] A. V. den Boer and B. Zwart.

    Simultaneously learning and optimizing using controlled variance pricing.

    Management science, 60(3):770–783, 2013.
  • [15] O. Elshiewy, D. Guhl, and Y. Boztug. Multinomial logit models in marketing-from fundamentals to state-of-the-art. Marketing ZFP, 39(3):32–49, 2017.
  • [16] V. F. Farias and B. Van Roy. Dynamic pricing with a prior on market response. Operations Research, 58(1):16–29, 2010.
  • [17] K. J. Ferreira, D. Simchi-Levi, and H. Wang.

    Online network revenue management using thompson sampling.

  • [18] A. Goldenshluger and A. Zeevi. A linear response bandit problem. Stochastic Systems, 3(1):230–261, 2013.
  • [19] N. Golrezaei, A. Javanmard, and V. Mirrokni. Dynamic incentive-aware learning: Robust pricing in contextual auctions., 2018.
  • [20] J. M. Harrison, N. B. Keskin, and A. Zeevi. Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Science, 58(3):570–586, 2012.
  • [21] A. Javanmard. Perishability of data: dynamic pricing under varying-coefficient models. The Journal of Machine Learning Research, 18(1):1714–1744, 2017.
  • [22] A. Javanmard and H. Nazerzadeh. Dynamic pricing in high-dimensions. arXiv preprint arXiv:1609.07574 (accepted for publication in Journal of Machine Learning), 2016.
  • [23] B. Keskin. Optimal dynamic pricing with demand model uncertainty: A squared-coefficient-of-variation rule for learning and earning. Working Paper, 2014.
  • [24] I. Lobel, R. P. Leme, and A. Vladu. Multidimensional binary search for contextual decision-making. arXiv preprint arXiv:1611.00829, 2016.
  • [25] J. Mueller, V. Syrgkanis, and M. Taddy. Low-rank bandit methods for high-dimensional dynamic pricing. arXiv preprint arXiv:1801.10242, 2018.
  • [26] S. Qiang and M. Bayati. Dynamic pricing with demand covariates. arXiv preprint arXiv:1604.07463, 2016.
  • [27] K. T. Talluri and G. J. Van Ryzin. The theory and practice of revenue management, volume 68. Springer Science & Business Media, 2006.
  • [28] H. Zhang, P. Rusmevichientong, and H. Topaloglu. Multi-product pricing under the generalized extreme value models with homogeneous price sensitivity parameters. 2018.

Appendix A Proof of Technical Lemmas

a.1 Proof of Proposition 3.1

In the benchmark policy, the seller knows the model parameters . For simplicity, we use the shorthands , , and the sum as . The revenue function can be written in terms of as

where we used (2). Writing the stationarity condition for the optimal price vector , we get that for each :

which is equivalent to

Since , the above equation implies that


Define . We next show that is the solution to Equation (3). By multiplying both sides of (14) by and summing over , we have

By definition of , the left-hand side of the above equation is equal to . By rearranging the terms we obtain

where the second line follows from Equation (14).

Regarding the uniqueness of the solution of (3), note that the left-hand side of (3) is strictly increasing in and is zero at , while the right hand side is strictly decreasing in and is positive at . Therefore, Equation (3) has a unique solution.

a.2 Proof of Lemma 5.2

Define function as


By characterization of the pricing function , given in Proposition 3.1, we have and , where and are the solution of and .

By implicit function theorem for a point that satisfies = 0, there exists an open set around , and a unique differentiable function such that and for all . Furthermore, the partial derivative of can be computed as

where in the last step we use the normalization , and is the lower bound on the price sensitivities. Likewise, we have

where we used the fact that the solution of satisfies . (This follows readily by noting that the right-hand side of (15) is non-increasing in .) This shows that is a Lipschitz function of , with Lipschitz constant , where . Therefore,


Hence, , for some constant . This completes the proof.

a.3 Proof of Proposition 5.3

We start by recalling the notation and define . To prove Proposition 5.3

, we first rewrite the loss function in terms of the augmented parameter vector

. (Recall our convention that corresponds to “no-purchase” with .)


where . The gradient and the hessian of are given by


We proceed by bounding the gradient and the hessian of the loss function. Before that, we establish an upper bound on the prices that are set by the pricing function .

Lemma A.1.

Suppose that and . Let be the solution to the following equation:


Then, the prices set by the pricing function , where , are bounded by .

The proof of above Lemma follows readily by noting that the right-hand side of (20) is an upper bound for the right hand side of (3) and therefore . The results then follows by recalling that the pricing function sets prices as .

To bound the gradient of the loss function at the true model parameters, note that


We also have


for a constant . We also note that by (18), is written as some of terms. In each term, the index has randomness coming from the market noise distribution. By a straightforward calculation, one can verify that each of these terms has zero expectation. Using (22) and by applying Azuma-Hoeffding inequality to the right-hand side of (A.3), followed by union bounding over coordinates of feature vectors, we obtain


with probability at least . (Note that we can absorb in constant since it already depends on .)

We next pass to lower bonding the hessian of the loss. For , we write