Learning under Invariable Bayesian Safety

by   Gal Bahar, et al.

A recent body of work addresses safety constraints in explore-and-exploit systems. Such constraints arise where, for example, exploration is carried out by individuals whose welfare should be balanced with overall welfare. In this paper, we adopt a model inspired by recent work on a bandit-like setting for recommendations. We contribute to this line of literature by introducing a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold. Due to our modeling, the safe explore-and-exploit policy deserves careful planning, or otherwise, it will lead to sub-optimal welfare. We devise an asymptotically optimal algorithm for the setting and analyze its instance-dependent convergence rate.


page 1

page 2

page 3

page 4


Safe Linear Thompson Sampling

The design and performance analysis of bandit algorithms in the presence...

The Constrained Round Robin Algorithm for Fair and Efficient Allocation

We consider a multi-agent resource allocation setting that models the as...

Asymptotically Optimal Welfare of Posted Pricing for Multiple Items with MHR Distributions

We consider the problem of posting prices for unit-demand buyers if all ...

Regret Bounds for Safe Gaussian Process Bandit Optimization

Many applications require a learner to make sequential decisions given u...

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

In this paper, we consider the setting of piecewise i.i.d. bandits under...

Contextual Games: Multi-Agent Learning with Side Information

We formulate the novel class of contextual games, a type of repeated gam...