Bandit Phase Retrieval

06/03/2021
by   Tor Lattimore, et al.
0

We study a bandit version of phase retrieval where the learner chooses actions (A_t)_t=1^n in the d-dimensional unit ball and the expected reward is ⟨ A_t, θ_⋆⟩^2 where θ_⋆∈ℝ^d is an unknown parameter vector. We prove that the minimax cumulative regret in this problem is Θ̃(d √(n)), which improves on the best known bounds by a factor of √(d). We also show that the minimax simple regret is Θ̃(d / √(n)) and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling are not sufficient for optimal regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2012

Towards minimax policies for online linear optimization with bandit feedback

We address the online linear optimization problem with bandit feedback. ...
research
02/18/2021

A Bit Better? Quantifying Information for Bandit Learning

The information ratio offers an approach to assessing the efficacy with ...
research
03/30/2021

Optimal Stochastic Nonconvex Optimization with Bandit Feedback

In this paper, we analyze the continuous armed bandit problems for nonco...
research
06/16/2020

Corralling Stochastic Bandit Algorithms

We study the problem of corralling stochastic bandit algorithms, that is...
research
06/16/2023

Understanding the Role of Feedback in Online Learning with Switching Costs

In this paper, we study the role of feedback in online learning with swi...
research
02/09/2017

Efficient Policy Learning

We consider the problem of using observational data to learn treatment a...
research
11/18/2015

Online learning in repeated auctions

Motivated by online advertising auctions, we consider repeated Vickrey a...

Please sign up or login with your details

Forgot password? Click here to reset