Multi-Objective Generalized Linear Bandits

05/30/2019
by   Shiyin Lu, et al.
0

In this paper, we study the multi-objective bandits (MOB) problem, where a learner repeatedly selects one arm to play and then receives a reward vector consisting of multiple objectives. MOB has found many real-world applications as varied as online recommendation and network routing. On the other hand, these applications typically contain contextual information that can guide the learning process which, however, is ignored by most of existing work. To utilize this information, we associate each arm with a context vector and assume the reward follows the generalized linear model (GLM). We adopt the notion of Pareto regret to evaluate the learner's performance and develop a novel algorithm for minimizing it. The essential idea is to apply a variant of the online Newton step to estimate model parameters, based on which we utilize the upper confidence bound (UCB) policy to construct an approximation of the Pareto front, and then uniformly at random choose one arm from the approximate Pareto front. Theoretical analysis shows that the proposed algorithm achieves an Õ(d√(T)) Pareto regret, where T is the time horizon and d is the dimension of contexts, which matches the optimal result for single objective contextual bandits problem. Numerical experiments demonstrate the effectiveness of our method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2018

Multi-objective Contextual Bandit Problem with Similarity Information

In this paper we propose the multi-objective contextual bandit problem w...
research
10/16/2021

On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits

We study the Pareto frontier of two archetypal objectives in stochastic ...
research
05/31/2023

Pareto Front Identification with Regret Minimization

We consider Pareto front identification for linear bandits (PFILin) wher...
research
04/02/2020

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Contextual multi-armed bandit (MAB) achieves cutting-edge performance on...
research
02/10/2023

Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and Sensing

We study a multi-objective multi-armed bandit problem in a dynamic envir...
research
04/28/2020

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with fi...
research
07/06/2023

Optimal Scalarizations for Sublinear Hypervolume Regret

Scalarization is a general technique that can be deployed in any multiob...

Please sign up or login with your details

Forgot password? Click here to reset