Stochastic Linear Bandits with Protected Subspace

11/02/2020
by   Advait Parulekar, et al.
0

We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a protected space) given only zero-order stochastic oracle access to both the objective itself and protected subspace. In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action. Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space, and (in almost all rounds) plays optimistically with respect to a confidence set for this space. We provide a Õ(sd√(T)) regret upper bound in the case where the action space is the complete unit ball in ℝ^d, s < d is the dimension of the protected subspace, and T is the time horizon. Moreover, we demonstrate that a discrete action space can lead to linear regret with an optimistic algorithm, reinforcing the sub-optimality of optimism in certain settings. We also show that protection constraints imply that for certain settings, no consistent algorithm can have a regret smaller than Ω(T^3/4). We finally empirically validate our results with synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits

This paper considers stochastic linear bandits with general constraints....
research
05/05/2019

Learning to Control in Metric Space with Optimal Regret

We study online reinforcement learning for finite-horizon deterministic ...
research
07/02/2020

Multi-Agent Low-Dimensional Linear Bandits

We study a multi-agent stochastic linear bandit with side information, p...
research
10/12/2019

What You See May Not Be What You Get: UCB Bandit Algorithms Robust to ε-Contamination

Motivated by applications of bandit algorithms in education, we consider...
research
07/24/2023

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, wh...
research
09/30/2020

Stage-wise Conservative Linear Bandits

We study stage-wise conservative linear stochastic bandits: an instance ...
research
09/10/2019

Optimality of the Subgradient Algorithm in the Stochastic Setting

Recently Jaouad Mourtada and Stéphane Gaïffas showed the anytime hedge a...

Please sign up or login with your details

Forgot password? Click here to reset