Regret in Online Combinatorial Optimization

04/20/2012
by   Jean-Yves Audibert, et al.
0

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have achieved by picking, in hindsight, the best possible action. Our goal is to understand the magnitude of the best possible (minimax) regret. We study the problem under three different assumptions for the feedback the decision maker receives: full information, and the partial information models of the so-called "semi-bandit" and "bandit" problems. Combining the Mirror Descent algorithm and the INF (Implicitely Normalized Forecaster) strategy, we are able to prove optimal bounds for the semi-bandit case. We also recover the optimal bounds for the full information setting. In the bandit case we discuss existing results in light of a new lower bound, and suggest a conjecture on the optimal regret in that case. Finally we also prove that the standard exponentially weighted average forecaster is provably suboptimal in the setting of online combinatorial optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2011

Minimax Policies for Combinatorial Prediction Games

We address the online linear optimization problem when the actions of th...
research
02/02/2019

First-Order Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior...
research
04/18/2019

Semi-bandit Optimization in the Dispersed Setting

In this work, we study the problem of online optimization of piecewise L...
research
10/05/2020

An Efficient Algorithm for Cooperative Semi-Bandits

We consider the problem of asynchronous online combinatorial optimizatio...
research
03/17/2015

Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

We propose a sample-efficient alternative for importance weighting for s...
research
02/08/2019

Bandit Principal Component Analysis

We consider a partial-feedback variant of the well-studied online PCA pr...
research
04/23/2022

Smoothed Online Combinatorial Optimization Using Imperfect Predictions

Smoothed online combinatorial optimization considers a learner who repea...

Please sign up or login with your details

Forgot password? Click here to reset