Learning under Invariable Bayesian Safety

06/08/2020
by   Gal Bahar, et al.
7

A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this line of literature by introducing a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold. Due to our modeling, the safe explore-and-exploit policy deserves careful planning, or otherwise, it will lead to sub-optimal welfare. We devise an asymptotically optimal algorithm for the setting and analyze its instance-dependent convergence rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/06/2019

Safe Linear Thompson Sampling

The design and performance analysis of bandit algorithms in the presence...
08/01/2019

The Constrained Round Robin Algorithm for Fair and Efficient Allocation

We consider a multi-agent resource allocation setting that models the as...
07/01/2021

Asymptotically Optimal Welfare of Posted Pricing for Multiple Items with MHR Distributions

We consider the problem of posting prices for unit-demand buyers if all ...
05/05/2020

Regret Bounds for Safe Gaussian Process Bandit Optimization

Many applications require a learner to make sequential decisions given u...
05/27/2022

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

In this paper, we consider the setting of piecewise i.i.d. bandits under...
07/13/2021

Contextual Games: Multi-Agent Learning with Side Information

We formulate the novel class of contextual games, a type of repeated gam...