Diverse Exploration for Fast and Safe Policy Improvement

02/22/2018
by   Andrew Cohen, et al.
0

We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains. As its solution, we propose a novel exploration strategy - diverse exploration (DE), which learns and deploys a diverse set of safe policies to explore the environment. We provide DE theory explaining why diversity in behavior policies enables effective exploration without sacrificing exploitation. Our empirical study shows that an online policy improvement algorithm framework implementing the DE strategy can achieve both fast policy improvement and safe online performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2019

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

We address the challenge of effective exploration while maintaining good...
research
11/08/2021

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...
research
09/26/2020

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

We present Revel, a partially neural reinforcement learning (RL) framewo...
research
02/02/2016

Better safe than sorry: Risky function exploitation through safe optimization

Exploration-exploitation of functions, that is learning and optimizing a...
research
02/26/2022

Safe Exploration for Efficient Policy Evaluation and Comparison

High-quality data plays a central role in ensuring the accuracy of polic...
research
05/20/2018

Safe Policy Learning from Observations

In this paper, we consider the problem of learning a policy by observing...
research
01/06/2023

Centralized Cooperative Exploration Policy for Continuous Control Tasks

The deep reinforcement learning (DRL) algorithm works brilliantly on sol...

Please sign up or login with your details

Forgot password? Click here to reset