A Note on KL-UCB+ Policy for the Stochastic Bandit

03/19/2019
by   Junya Honda, et al.
0

A classic setting of the stochastic K-armed bandit problem is considered in this note. In this problem it has been known that KL-UCB policy achieves the asymptotically optimal regret bound and KL-UCB+ policy empirically performs better than the KL-UCB policy although the regret bound for the original form of the KL-UCB+ policy has been unknown. This note demonstrates that a simple proof of the asymptotic optimality of the KL-UCB+ policy can be given by the same technique as those used for analyses of other known policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2023

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

We study K-armed bandit problems where the reward distributions of the a...
research
04/16/2018

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

In this work, we address the open problem of finding low-complexity near...
research
01/06/2016

On Bayesian index policies for sequential resource allocation

This paper is about index policies for minimizing (frequentist) regret i...
research
02/23/2017

A minimax and asymptotically optimal algorithm for stochastic bandits

We propose the kl-UCB ++ algorithm for regret minimization in stochastic...
research
02/11/2021

Optimization Issues in KL-Constrained Approximate Policy Iteration

Many reinforcement learning algorithms can be seen as versions of approx...
research
11/03/2018

Stochastic Neighbor Embedding under f-divergences

The t-distributed Stochastic Neighbor Embedding (t-SNE) is a powerful an...
research
04/06/2021

On the Optimality of Batch Policy Optimization Algorithms

Batch policy optimization considers leveraging existing data for policy ...

Please sign up or login with your details

Forgot password? Click here to reset