Optimal Scalarizations for Sublinear Hypervolume Regret

07/06/2023
by   Qiuyi Zhang, et al.
0

Scalarization is a general technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, such as recently in RLHF for training reward models that align human preferences. Yet some have dismissed this classical approach because linear scalarizations are known to miss concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that can explore a diverse set of k objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret, achieving an optimal sublinear regret bound of O(T^-1/k), with matching lower bounds that preclude any algorithm from doing better asymptotically. As a theoretical case study, we consider the multiobjective stochastic linear bandits problem and demonstrate that by exploiting the sublinear regret bounds of the hypervolume scalarizations, we can derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of Õ( d T^-1/2 + T^-1/k). We support our theory with strong empirical performance of using simple hypervolume scalarizations that consistently outperforms both the linear and Chebyshev scalarizations, as well as standard multiobjective algorithms in bayesian optimization, such as EHVI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits

We study the Pareto frontier of two archetypal objectives in stochastic ...
research
05/30/2018

A Flexible Multi-Objective Bayesian Optimization Approach using Random Scalarizations

Many real world applications can be framed as multi-objective optimizati...
research
02/12/2023

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

We consider the sequential decision-making problem where the mean outcom...
research
06/15/2023

Logarithmic Bayes Regret Bounds

We derive the first finite-time logarithmic regret bounds for Bayesian b...
research
11/04/2022

Distributed Linear Bandits under Communication Constraints

We consider distributed linear bandits where M agents learn collaborativ...
research
08/06/2018

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

We give a simple optimistic algorithm for which it is easy to derive reg...
research
05/30/2019

Multi-Objective Generalized Linear Bandits

In this paper, we study the multi-objective bandits (MOB) problem, where...

Please sign up or login with your details

Forgot password? Click here to reset