Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of Non-Stationary Bandit Problems

02/26/2017
by   Rahul Singh, et al.
0

We consider the problem of designing an allocation rule or an "online learning algorithm" for a class of bandit problems in which the set of control actions available at each time s is a convex, compact subset of R^d. Upon choosing an action x at time s, the algorithm obtains a noisy value of the unknown and time-varying function f_s evaluated at x. The "regret" of an algorithm is the gap between its expected reward, and the reward earned by a strategy which has the knowledge of the function f_s at each time s and hence chooses the action x_s that maximizes f_s. For this non-stationary bandit problem set-up, we consider two variants of the Kiefer Wolfowitz (KW) algorithm i) KW with fixed step-size β, and ii) KW with sliding window of length L. We show that if the number of times that the function f_s varies during time T is o(T), and if the learning rates of the proposed algorithms are chosen "optimally", then the regret of the proposed algorithms is o(T), and hence the algorithms are asymptotically efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem a...
research
11/06/2021

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

We consider the problem of controlling a Linear Quadratic Regulator (LQR...
research
03/04/2019

Hedging the Drift: Learning to Optimize under Non-Stationarity

We introduce general data-driven decision-making algorithms that achieve...
research
03/05/2020

Non-stationary neural network for stock return prediction

We consider the problem of neural network training in a time-varying con...
research
06/26/2019

Orthogonal Projection in Linear Bandits

The expected reward in a linear stochastic bandit model is an unknown li...
research
12/11/2019

Near-optimal Oracle-efficient Algorithms for Stationary and Non-Stationary Stochastic Linear Bandits

We investigate the design of two algorithms that enjoy not only computat...
research
09/12/2019

Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds

We study an online linear programming (OLP) problem under a random input...

Please sign up or login with your details

Forgot password? Click here to reset