No-Regret Stateful Posted Pricing
In this paper, a rather general online problem called dynamic resource allocation with capacity constraints (DRACC) is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling. As the existing online learning techniques do not yield no-regret mechanisms for this problem, we develop a new online learning framework defined over deterministic Markov decision processes with dynamic state transition and reward functions. We then prove that if the Markov decision process is guaranteed to admit a dominant state in each round and there exists an oracle that can switch the internal state with bounded loss, a condition that is satisfied in the DRACC problem, then the online learning problem can be solved with vanishing regret. Our proof technique is based on a reduction to full information online learning with switching cost (Kalai and Vempala, 2005), in which an online decision maker incurs an extra cost every time she switches from one arm to another. We demonstrate this connection formally, and further show how DRACC can be used in our proposed applications of stateful pricing.
READ FULL TEXT