Factored Bandits

07/04/2018
by   Julian Zimmert, et al.
0

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2019

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple fram...
research
09/26/2013

Finite-Time Analysis of Kernelised Contextual Bandits

We tackle the problem of online reward maximisation over a large finite ...
research
06/18/2019

Simple Algorithms for Dueling Bandits

In this paper, we present simple algorithms for Dueling Bandits. We prov...
research
02/16/2023

Linear Bandits with Memory: from Rotting to Rising

Nonstationary phenomena, such as satiation effects in recommendation, ar...
research
02/11/2013

Adaptive-treed bandits

We describe a novel algorithm for noisy global optimisation and continuu...
research
01/28/2023

(Private) Kernelized Bandits with Distributed Biased Feedback

In this paper, we study kernelized bandits with distributed biased feedb...
research
12/06/2021

Nonstochastic Bandits with Composite Anonymous Feedback

We investigate a nonstochastic bandit setting in which the loss of an ac...

Please sign up or login with your details

Forgot password? Click here to reset