Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

06/25/2022
by   Igor Kuznetsov, et al.
0

The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. However, current approaches use random noise as a common exploration method that has several weaknesses, such as a need for manual adjusting on a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses a differential directional controller to incorporate scalable exploratory action correction. An ensemble of Monte Carlo Critics that provides exploratory direction is presented as a controller. The proposed method improves the traditional exploration scheme by changing exploration dynamically. We then present a novel algorithm exploiting the proposed directional controller for both policy and critic modification. The presented algorithm outperforms modern reinforcement learning algorithms across a variety of problems from DMControl suite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2018

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

Efficient exploration remains a challenging research problem in reinforc...
research
06/23/2022

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple i...
research
11/21/2019

Accelerating Reinforcement Learning with Suboptimal Guidance

Reinforcement Learning in domains with sparse rewards is a difficult pro...
research
03/22/2021

Improving Actor-Critic Reinforcement Learning via Hamiltonian Policy

Approximating optimal policies in reinforcement learning (RL) is often n...
research
02/10/2020

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

A simple and natural algorithm for reinforcement learning is Monte Carlo...
research
12/03/1998

Training Reinforcement Neurocontrollers Using the Polytope Algorithm

A new training algorithm is presented for delayed reinforcement learning...
research
05/25/2014

HEPGAME and the Simplification of Expressions

Advances in high energy physics have created the need to increase comput...

Please sign up or login with your details

Forgot password? Click here to reset