Online Allocation and Pricing: Constant Regret via Bellman Inequalities

06/14/2019
by   Alberto Vera, et al.
0

We develop a framework for designing tractable heuristics for Markov Decision Processes (MDP), and use it to obtain constant regret policies for a variety of online allocation problems, including online packing, budget-constrained probing, dynamic pricing, and online contextual bandits with knapsacks. Our approach is based on adaptively constructing a benchmark for the value function, which we then use to select our actions. The centerpiece of our framework are the Bellman Inequalities, which allow us to create benchmarks which both have access to future information, and also, can violate the one-step optimality equations (i.e., Bellman equations). The flexibility of balancing these allows us to get policies which are both tractable and have strong performance guarantees -- in particular, our constant-regret policies only require solving an LP for selecting each action.

READ FULL TEXT
research
05/04/2020

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

In this paper, a rather general online problem called dynamic resource a...
research
01/15/2019

The Bayesian Prophet: A Low-Regret Framework for Online Decision Making

Motivated by the success of using black-box predictive algorithms as sub...
research
05/04/2020

No-Regret Stateful Posted Pricing

In this paper, a rather general online problem called dynamic resource a...
research
12/28/2020

Blackwell Online Learning for Markov Decision Processes

This work provides a novel interpretation of Markov Decision Processes (...
research
07/10/2020

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

We consider the problem of exploration-exploitation in communicating Mar...
research
10/24/2022

Conditionally Risk-Averse Contextual Bandits

We desire to apply contextual bandits to scenarios where average-case st...

Please sign up or login with your details

Forgot password? Click here to reset