Orthogonal Projection in Linear Bandits

06/26/2019
by   Qiyu Kang, et al.
0

The expected reward in a linear stochastic bandit model is an unknown linear function of the chosen decision vector. In this paper, we consider the case where the expected reward is an unknown linear function of a projection of the decision vector onto a subspace. We call this the projection reward. Unlike the classical linear bandit problem, we assume that the projection reward is unobservable. Instead, the observed "reward" at each time step is the projection reward corrupted by another linear function of the decision vector projected onto a subspace orthogonal to the first. Such a model is useful in recommendation applications where the observed reward is corrupted by each individual's biases. In the case where there are finitely many decision vectors, we develop a strategy to achieve O( T) regret, where T is the number of time steps. In the case where the decision vector is chosen from an infinite compact set, our strategy achieves O(T^2/3(T)^1/2) regret. Simulations verify the efficiency of our strategy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2021

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

In this paper we study the adversarial combinatorial bandit with a known...
research
10/22/2019

Restless Hidden Markov Bandits with Linear Rewards

This paper presents an algorithm and regret analysis for the restless hi...
research
09/27/2022

A Doubly Optimistic Strategy for Safe Linear Bandits

We propose a doubly optimistic strategy for the safe-linear-bandit probl...
research
02/26/2017

Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of Non-Stationary Bandit Problems

We consider the problem of designing an allocation rule or an "online le...
research
07/02/2020

Multi-Agent Low-Dimensional Linear Bandits

We study a multi-agent stochastic linear bandit with side information, p...
research
05/28/2019

Repeated A/B Testing

We study a setting in which a learner faces a sequence of A/B tests and ...
research
06/17/2016

Structured Stochastic Linear Bandits

The stochastic linear bandit problem proceeds in rounds where at each ro...

Please sign up or login with your details

Forgot password? Click here to reset