Regret Minimization with Performative Feedback

02/01/2022
by   Meena Jagadeesan, et al.
0

In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it induces. We study the problem of finding near-optimal models under performativity while maintaining low regret. On the surface, this problem might seem equivalent to a bandit problem. However, it exhibits a fundamentally richer feedback structure that we refer to as performative feedback: after every deployment, the learner receives samples from the shifted distribution rather than only bandit feedback about the reward. Our main contribution is regret bounds that scale only with the complexity of the distribution shifts and not that of the reward function. The key algorithmic idea is careful exploration of the distribution shifts that informs a novel construction of confidence bounds on the risk of unexplored models. The construction only relies on smoothness of the shifts and does not assume convexity. More broadly, our work establishes a conceptual approach for leveraging tools from the bandits literature for the purpose of regret minimization with performative feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2018

Simple Regret Minimization for Contextual Bandits

There are two variants of the classical multi-armed bandit (MAB) problem...
research
09/20/2017

Bandits with Delayed Anonymous Feedback

We study the bandits with delayed anonymous feedback problem, a variant ...
research
06/14/2020

Combinatorial Pure Exploration with Partial or Full-Bandit Linear Feedback

In this paper, we propose the novel model of combinatorial pure explorat...
research
07/05/2018

Contextual Bandits under Delayed Feedback

Delayed feedback is an ubiquitous problem in many industrial systems emp...
research
06/07/2021

Beyond Bandit Feedback in Online Multiclass Classification

We study the problem of online multiclass classification in a setting wh...
research
02/23/2023

Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

Specifying reward functions for complex tasks like object manipulation o...
research
05/30/2023

Plug-in Performative Optimization

When predictions are performative, the choice of which predictor to depl...

Please sign up or login with your details

Forgot password? Click here to reset