The Symmetry between Arms and Knapsacks: A Primal-Dual Approach for Bandits with Knapsacks

02/12/2021
by   Xiaocheng Li, et al.
12

In this paper, we study the bandits with knapsacks (BwK) problem and develop a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound. The BwK problem extends the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm, and the existing BwK literature has been mainly focused on deriving asymptotically optimal distribution-free regret bounds. We first study the primal and dual linear programs underlying the BwK problem. From this primal-dual perspective, we discover symmetry between arms and knapsacks, and then propose a new notion of sub-optimality measure for the BwK problem. The sub-optimality measure highlights the important role of knapsacks in determining algorithm regret and inspires the design of our two-phase algorithm. In the first phase, the algorithm identifies the optimal arms and the binding knapsacks, and in the second phase, it exhausts the binding knapsacks via playing the optimal arms through an adaptive procedure. Our regret upper bound involves the proposed sub-optimality measure and it has a logarithmic dependence on length of horizon T and a polynomial dependence on m (the numbers of arms) and d (the number of knapsacks). To the best of our knowledge, this is the first problem-dependent logarithmic regret bound for solving the general BwK problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2019

Stochastic One-Sided Full-Information Bandit

In this paper, we study the stochastic version of the one-sided full inf...
research
10/23/2020

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

In the contextual linear bandit setting, algorithms built on the optimis...
research
01/24/2020

Ballooning Multi-Armed Bandits

In this paper, we introduce Ballooning Multi-Armed Bandits (BL-MAB), a n...
research
06/15/2020

Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality

We study stochastic structured bandits for minimizing regret. The fact t...
research
10/15/2021

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

We study the optimal batch-regret tradeoff for batch linear contextual b...
research
12/14/2022

Invariant Lipschitz Bandits: A Side Observation Approach

Symmetry arises in many optimization and decision-making problems, and h...
research
11/18/2021

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

The stochastic multi-arm bandit problem has been extensively studied und...

Please sign up or login with your details

Forgot password? Click here to reset