Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration

02/25/2020
by   Anji Liu, et al.
13

Off-policy reinforcement learning (RL) is concerned with learning a rewarding policy by executing another policy that gathers samples of experience. While the former policy (i.e. target policy) is rewarding but in-expressive (in most cases, deterministic), doing well in the latter task, in contrast, requires an expressive policy (i.e. behavior policy) that offers guided and effective exploration. Contrary to most methods that make a trade-off between optimality and expressiveness, disentangled frameworks explicitly decouple the two objectives, which each is dealt with by a distinct separate policy. Although being able to freely design and optimize the two policies with respect to their own objectives, naively disentangling them can lead to inefficient learning or stability issues. To mitigate this problem, our proposed method Analogous Disentangled Actor-Critic (ADAC) designs analogous pairs of actors and critics. Specifically, ADAC leverages a key property about Stein variational gradient descent (SVGD) to constraint the expressive energy-based behavior policy with respect to the target one for effective exploration. Additionally, an analogous critic pair is introduced to incorporate intrinsic rewards in a principled manner, with theoretical guarantees on the overall learning stability and effectiveness. We empirically evaluate environment-reward-only ADAC on 14 continuous-control tasks and report the state-of-the-art on 10 of them. We further demonstrate ADAC, when paired with intrinsic rewards, outperform alternatives in exploration-challenging tasks.

READ FULL TEXT
research
04/09/2021

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-poli...
research
10/02/2019

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

In recent years, advances in deep learning have enabled the application ...
research
06/23/2022

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple i...
research
05/16/2019

Leveraging exploration in off-policy algorithms via normalizing flows

Exploration is a crucial component for discovering approximately optimal...
research
11/06/2016

Modular Multitask Reinforcement Learning with Policy Sketches

We describe a framework for multitask deep reinforcement learning guided...
research
02/27/2017

Reinforcement Learning with Deep Energy-Based Policies

We propose a method for learning expressive energy-based policies for co...
research
02/08/2020

Capsule Network Performance with Autonomous Navigation

Capsule Networks (CapsNets) have been proposed as an alternative to Conv...

Please sign up or login with your details

Forgot password? Click here to reset