Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces

06/13/2018
by   Tatiana Tatarenko, et al.
0

We consider online convex optimization with zeroth-order feedback setting. The decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes the value of the current cost function for her chosen action (zeroth-order information). The objective is to minimize the regret, that is, the difference between the sum of the costs she accumulates and that of the optimal action had she known the sequence of cost functions a priori. We present a novel algorithm to minimize regret in both unconstrained and constrained action spaces. Our algorithm hinges on a one-point estimation of the gradients of the cost functions based on their observed values. Moreover, we adapt the presented algorithm to the setting with two-point estimations and demonstrate that the adapted procedure achieves the theoretical lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2017

Regret Analysis for Continuous Dueling Bandit

The dueling bandit is a learning framework wherein the feedback informat...
research
06/21/2018

Online Saddle Point Problem with Applications to Constrained Online Convex Optimization

We study an online saddle point problem where at each iteration a pair o...
research
03/16/2022

Risk-Averse No-Regret Learning in Online Convex Games

We consider an online stochastic game with risk-averse agents whose goal...
research
03/03/2023

Nature's Cost Function: Simulating Physics by Minimizing the Action

In physics, there is a scalar function called the action which behaves l...
research
01/24/2023

On Dynamic Regret and Constraint Violations in Constrained Online Convex Optimization

A constrained version of the online convex optimization (OCO) problem is...
research
01/19/2022

PDE-Based Optimal Strategy for Unconstrained Online Learning

Unconstrained Online Linear Optimization (OLO) is a practical problem se...
research
07/11/2016

Kernel-based methods for bandit convex optimization

We consider the adversarial convex bandit problem and we build the first...

Please sign up or login with your details

Forgot password? Click here to reset