Bandit Principal Component Analysis

02/08/2019
by   Wojciech Kotłowski, et al.
0

We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of d-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices. We focus on a natural notion of bandit feedback where the learner only observes the loss associated with its own prediction. Based on the classical observation that this decision-making problem can be lifted to the space of density matrices, we propose an algorithm that is shown to achieve a regret of O(d^3/2√(T)) after T rounds in the worst case. We also prove data-dependent bounds that improve on the basic result when the loss matrices of the environment have bounded rank or the loss of the best action is bounded. One version of our algorithm runs in O(d) time per trial which massively improves over every previously known online PCA method. We complement these results by a lower bound of Ω(d√(T)).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2015

First-order regret bounds for combinatorial semi-bandits

We consider the problem of online combinatorial optimization under semi-...
research
04/27/2022

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback

We study the adversarial bandit problem with composite anonymous delayed...
research
09/30/2015

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

Partial monitoring is a general model for sequential learning with limit...
research
04/20/2012

Regret in Online Combinatorial Optimization

We address online linear optimization problems when the possible actions...
research
05/01/2023

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit setting, which allo...
research
01/18/2021

A note on the price of bandit feedback for mistake-bounded online learning

The standard model and the bandit model are two generalizations of the m...
research
06/29/2021

Exponential Weights Algorithms for Selective Learning

We study the selective learning problem introduced by Qiao and Valiant (...

Please sign up or login with your details

Forgot password? Click here to reset