Semi-parametric dynamic contextual pricing
We consider a canonical revenue maximization problem where customers arrive sequentially; each customer is interested in buying one product, and the customer purchases the product if her valuation for it exceeds the price set by the seller. The valuations of customers are not observed by the seller; however, the seller can leverage contextual information available to her in the form of noisy covariate vectors describing the customer's history and the product's type to set prices. The seller can learn the relationship between covariates and customer valuations by experimenting with prices and observing transaction outcomes. We consider a semi-parametric model where the relationship between the expectation of the log of valuation and the covariates is linear (hence parametric) and the residual uncertainty distribution, i.e., the noise distribution, is non-parametric. We develop a pricing policy, DEEP-C, which learns this relationship with minimal exploration and in turn achieves optimal regret asymptotically in the time horizon.
READ FULL TEXT