Meta Dynamic Pricing: Learning Across Experiments

02/28/2019
by   Hamsa Bastani, et al.
0

We study the problem of learning across a sequence of price experiments for related products, focusing on implementing the Thompson sampling algorithm for dynamic pricing. We consider a practical formulation of this problem where the unknown parameters of the demand function for each product come from a prior that is shared across products, but is unknown a priori. Our main contribution is a meta dynamic pricing algorithm that learns this prior online while solving a sequence of non-overlapping pricing experiments (each with horizon T) for N different products. Our algorithm addresses two challenges: (i) balancing the need to learn the prior (meta-exploration) with the need to leverage the current estimate of the prior to achieve good performance (meta-exploitation), and (ii) accounting for uncertainty in the estimated prior by appropriately "widening" the prior as a function of its estimation error, thereby ensuring convergence of each price experiment. We prove that the price of an unknown prior for Thompson sampling is negligible in experiment-rich environments (large N). In particular, our algorithm's meta regret can be upper bounded by O(√(NT)) when the covariance of the prior is known, and O(N^3/4√(T)) otherwise. Numerical experiments on synthetic and real auto loan data demonstrate that our algorithm significantly speeds up learning compared to prior-independent algorithms or a naive approach of greedily using the updated prior across products.

READ FULL TEXT
research
02/17/2019

Context-Based Dynamic Pricing with Online Clustering

We consider a context-based dynamic pricing problem of online products w...
research
04/25/2016

Dynamic Pricing with Demand Covariates

We consider a firm that sells products over T periods without knowing th...
research
10/19/2021

Dynamic pricing and assortment under a contextual MNL demand

We consider dynamic multi-product pricing and assortment problems under ...
research
08/29/2017

Learning to Price with Reference Effects

As a firm varies the price of a product, consumers exhibit reference eff...
research
03/04/2019

Hedging the Drift: Learning to Optimize under Non-Stationarity

We introduce general data-driven decision-making algorithms that achieve...
research
02/11/2021

Meta-Thompson Sampling

Efficient exploration in multi-armed bandits is a fundamental online lea...
research
02/20/2021

Logarithmic Regret in Feature-based Dynamic Pricing

Feature-based dynamic pricing is an increasingly popular model of settin...

Please sign up or login with your details

Forgot password? Click here to reset