On Kernelized Multi-Armed Bandits with Constraints

03/29/2022
by   Xingyu Zhou, et al.
0

We study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm. This kernelized bandit setup strictly generalizes standard multi-armed bandits and linear bandits. In contrast to safety-type hard constraints studied in prior works, we consider soft constraints that may be violated in any round as long as the cumulative violations are small, which is motivated by various practical applications. Our ultimate goal is to study how to utilize the nature of soft constraints to attain a finer complexity-regret-constraint trade-off in the kernelized bandit setting. To this end, leveraging primal-dual optimization, we propose a general framework for both algorithm design and performance analysis. This framework builds upon a novel sufficient condition, which not only is satisfied under general exploration strategies, including upper confidence bound (UCB), Thompson sampling (TS), and new ones based on random exploration, but also enables a unified analysis for showing both sublinear regret and sublinear or even zero constraint violation. We demonstrate the superior performance of our proposed algorithms via numerical experiments based on both synthetic and real-world datasets. Along the way, we also make the first detailed comparison between two popular methods for analyzing constrained bandits and Markov decision processes (MDPs) by discussing the key difference and some subtleties in the analysis, which could be of independent interest to the communities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2020

Stochastic Bandits with Linear Constraints

We study a constrained contextual linear bandit setting, where the goal ...
research
04/20/2020

Thompson Sampling for Linearly Constrained Bandits

We address multi-armed bandits (MAB) where the objective is to maximize ...
research
06/30/2020

Continuous-Time Multi-Armed Bandits with Controlled Restarts

Time-constrained decision processes have been ubiquitous in many fundame...
research
01/30/2023

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

We investigate the problem of stochastic, combinatorial multi-armed band...
research
02/10/2021

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits

This paper considers stochastic linear bandits with general constraints....
research
08/24/2023

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

We propose a novel master-slave architecture to solve the top-K combinat...
research
11/27/2022

Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints

This paper studies the problem of stochastic continuum-armed bandit with...

Please sign up or login with your details

Forgot password? Click here to reset