Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

09/30/2015
by   Junpei Komiyama, et al.
0

Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2015

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

We study the K-armed dueling bandit problem, a variation of the standard...
research
02/10/2011

Toward a Classification of Finite Partial-Monitoring Games

Partial-monitoring games constitute a mathematical framework for sequent...
research
05/23/2018

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

Partial monitoring is a generalization of the well-known multi-armed ban...
research
07/13/2019

Preselection Bandits under the Plackett-Luce Model

In this paper, we introduce the Preselection Bandit problem, in which th...
research
12/02/2018

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...
research
02/08/2019

Bandit Principal Component Analysis

We consider a partial-feedback variant of the well-studied online PCA pr...
research
08/02/2023

Certified Multi-Fidelity Zeroth-Order Optimization

We consider the problem of multi-fidelity zeroth-order optimization, whe...

Please sign up or login with your details

Forgot password? Click here to reset