Bilinear Bandits with Low-rank Structure

01/08/2019
by   Kwang-Sung Jun, et al.
0

We introduce the bilinear bandit problem with low-rank structure where an action is a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The problem is motivated by numerous applications in which the learner must recommend two different entity types as one action, such as a male / female pair in an online dating service. The unknown in the problem is a d_1 by d_2 matrix Θ^* with rank r ≪{d_1,d_2} governing the reward generation. Determination of Θ^* with low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called "Explore-Subspace-Then-Refine" (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called "almost-low-dimensional OFUL" (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is Õ((d_1+d_2)^3/2√(r T)) (where Õ hides logarithmic factors), which improves upon the regret of Õ(d_1d_2√(T)) of a naive linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2020

Low-Rank Generalized Linear Bandit Problems

In a low-rank linear bandit problem, the reward of an action (represente...
research
01/28/2019

Stochastic Linear Bandits with Hidden Low Rank Structure

High-dimensional representations often have a lower dimensional underlyi...
research
06/28/2023

Pure exploration in multi-armed bandits with low rank structure using oblivious sampler

In this paper, we consider the low rank structure of the reward sequence...
research
12/14/2020

Best Arm Identification in Graphical Bilinear Bandits

We introduce a new graphical bilinear bandit problem where a learner (or...
research
02/18/2021

A Simple Unified Framework for High Dimensional Bandit Problems

Stochastic high dimensional bandit problems with low dimensional structu...
research
09/08/2022

Online Low Rank Matrix Completion

We study the problem of online low-rank matrix completion with 𝖬 users, ...
research
06/01/2022

An α-No-Regret Algorithm For Graphical Bilinear Bandits

We propose the first regret-based approach to the Graphical Bilinear Ban...

Please sign up or login with your details

Forgot password? Click here to reset