Safe Optimal Design with Applications in Policy Learning

11/08/2021
by   Ruihao Zhu, et al.
0

Motivated by practical needs in online experimentation and off-policy learning, we study the problem of safe optimal design, where we develop a data logging policy that efficiently explores while achieving competitive rewards with a baseline production policy. We first show, perhaps surprisingly, that a common practice of mixing the production policy with uniform exploration, despite being safe, is sub-optimal in maximizing information gain. Then we propose a safe optimal logging policy for the case when no side information about the actions' expected rewards is available. We improve upon this design by considering side information and also extend both approaches to a large number of actions with a linear reward model. We analyze how our data logging policies impact errors in off-policy learning. Finally, we empirically validate the benefit of our designs by conducting extensive experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2018

Diverse Exploration for Fast and Safe Policy Improvement

We study an important yet under-addressed problem of quickly and safely ...
research
12/19/2017

Safe Policy Improvement with Baseline Bootstrapping

A common goal in Reinforcement Learning is to derive a good strategy giv...
research
02/26/2022

Safe Exploration for Efficient Policy Evaluation and Comparison

High-quality data plays a central role in ensuring the accuracy of polic...
research
05/17/2023

Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems

Off-Policy reinforcement learning has been a driving force for the state...
research
08/01/2022

Safe Policy Improvement Approaches and their Limitations

Safe Policy Improvement (SPI) is an important technique for offline rein...
research
03/23/2011

Doubly Robust Policy Evaluation and Learning

We study decision making in environments where the reward is only partia...
research
07/25/2017

Dynamic Policies for Cooperative Networked Systems

A set of economic entities embedded in a network graph collaborate by op...

Please sign up or login with your details

Forgot password? Click here to reset