Unifying the stochastic and the adversarial Bandits with Knapsack

10/23/2018
by   Anshuka Rangi, et al.
0

This paper investigates the adversarial Bandits with Knapsack (BwK) online learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost, and receives a reward associated with the action. The player is constrained by the maximum budget B that can be spent to perform actions, and the rewards and the costs of the actions are assigned by an adversary. This problem has only been studied in the restricted setting where the reward of an action is greater than the cost of the action, while we provide a solution in the general setting. Namely, we propose EXP3.BwK, a novel algorithm that achieves order optimal regret. We also propose EXP3++.BwK, which is order optimal in the adversarial BwK setup, and incurs an almost optimal expected regret with an additional factor of (B) in the stochastic BwK setup. Finally, we investigate the case of having large costs for the actions (i.e., they are comparable to the budget size B), and show that for the adversarial setting, achievable regret bounds can be significantly worse, compared to the case of having costs bounded by a constant, which is a common assumption within the BwK literature.

READ FULL TEXT
research
09/07/2021

Online Learning for Cooperative Multi-Player Multi-Armed Bandits

We introduce a framework for decentralized online learning for multi-arm...
research
10/28/2018

MaxHedge: Maximising a Maximum Online with Theoretical Performance Guarantees

We introduce a new online learning framework where, at each trial, the l...
research
02/29/2020

Budget-Constrained Bandits over General Cost and Reward Distributions

We consider a budget-constrained bandit problem where each arm pull incu...
research
10/21/2022

Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles

We study the stochastic contextual bandit with knapsacks (CBwK) problem,...
research
10/23/2020

Finite Continuum-Armed Bandits

We consider a situation where an agent has T ressources to be allocated ...
research
06/01/2022

Contextual Bandits with Knapsacks for a Conversion Model

We consider contextual bandits with knapsacks, with an underlying struct...
research
02/27/2023

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Consider a decision-maker that can pick one out of K actions to control ...

Please sign up or login with your details

Forgot password? Click here to reset