Regret Analysis for Continuous Dueling Bandit

11/21/2017
by   Wataru Kumagai, et al.
0

The dueling bandit is a learning framework wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions. In this research, we address a dueling bandit problem based on a cost function over a continuous space. We propose a stochastic mirror descent algorithm and show that the algorithm achieves an O(√(T T))-regret bound under strong convexity and smoothness assumptions for the cost function. Subsequently, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function. Moreover, when considering a lower bound in convex optimization, our algorithm is shown to achieve the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2016

Kernel-based methods for bandit convex optimization

We consider the adversarial convex bandit problem and we build the first...
research
10/01/2018

Risk-Averse Stochastic Convex Bandit

Motivated by applications in clinical trials and finance, we study the p...
research
06/13/2018

Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces

We consider online convex optimization with zeroth-order feedback settin...
research
07/31/2015

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

We consider the closely related problems of bandit convex optimization w...
research
03/16/2021

Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization with Bandit Feedback

Prices generated by automated price experimentation algorithms often dis...
research
05/18/2018

Projection-Free Bandit Convex Optimization

In this paper, we propose the first computationally efficient projection...
research
12/08/2017

Stochastic Dual Coordinate Descent with Bandit Sampling

Coordinate descent methods minimize a cost function by updating a single...

Please sign up or login with your details

Forgot password? Click here to reset