Safe Exploration for Efficient Policy Evaluation and Comparison

02/26/2022
by   Runzhe Wan, et al.
0

High-quality data plays a central role in ensuring the accuracy of policy evaluation. This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. For each variant, we analyze its statistical properties, derive the corresponding exploration policy, and design an efficient algorithm for computing it. Both theoretical analysis and experiments support the usefulness of the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

In this paper, we study the problem of optimal data collection for polic...
research
12/19/2017

Safe Policy Improvement with Baseline Bootstrapping

A common goal in Reinforcement Learning is to derive a good strategy giv...
research
02/22/2018

Diverse Exploration for Fast and Safe Policy Improvement

We study an important yet under-addressed problem of quickly and safely ...
research
11/08/2021

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...
research
06/09/2023

Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints

This paper investigates conservative exploration in reinforcement learni...
research
08/07/2020

SafePILCO: a software tool for safe and data-efficient policy synthesis

SafePILCO is a software tool for safe and data-efficient policy search w...
research
08/19/2022

Spectral Decomposition Representation for Reinforcement Learning

Representation learning often plays a critical role in reinforcement lea...

Please sign up or login with your details

Forgot password? Click here to reset