Online Stochastic Optimization under Correlated Bandit Feedback

02/04/2014
by   Mohammad Gheshlaghi Azar, et al.
0

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel any-time X-armed bandit algorithm, and derive regret bounds matching the performance of existing state-of-the-art in terms of dependency on number of steps and smoothness factor. The main advantage of HCT is that it handles the challenging case of correlated rewards, whereas existing methods require that the reward-generating process of each arm is an identically and independent distributed (iid) random process. HCT also improves on the state-of-the-art in terms of its memory requirement as well as requiring a weaker smoothness assumption on the mean-reward function in compare to the previous anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

DART: aDaptive Accept RejecT for non-linear top-K subset identification

We consider the bandit problem of selecting K out of N arms at each time...
research
08/09/2014

Bandit Algorithms for Tree Search

Bandit based methods for tree search have recently gained popularity whe...
research
05/08/2019

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

We consider the combinatorial multi-armed bandit (CMAB) problem, where t...
research
12/04/2020

One-bit feedback is sufficient for upper confidence bound policies

We consider a variant of the traditional multi-armed bandit problem in w...
research
09/18/2019

No-Regret Learning in Unknown Games with Correlated Payoffs

We consider the problem of learning to play a repeated multi-agent game ...
research
07/05/2015

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

We consider the correlated multiarmed bandit (MAB) problem in which the ...
research
07/31/2018

Online Adaptative Curriculum Learning for GANs

Generative Adversarial Networks (GANs) can successfully learn a probabil...

Please sign up or login with your details

Forgot password? Click here to reset