Leveraging exploration in off-policy algorithms via normalizing flows

05/16/2019
by   Bogdan Mazoure, et al.
0

Exploration is a crucial component for discovering approximately optimal policies in most high-dimensional reinforcement learning (RL) settings with sparse rewards. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been instrumental in recent advances. Soft actor-critic (SAC) is a method for improving exploration that aims to combine off-policy updates while maximizing the policy entropy. We extend SAC to a richer class of probability distributions through normalizing flows, which we show improves performance in exploration, sample complexity, and convergence. Finally, we show that not only the normalizing flow policy outperforms SAC on MuJoCo domains, it is also significantly lighter, using as low as 5.6 original network's parameters for similar performance.

READ FULL TEXT

page 5

page 6

research
06/06/2019

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action space...
research
12/11/2020

OPAC: Opportunistic Actor-Critic

Actor-critic methods, a type of model-free reinforcement learning (RL), ...
research
03/04/2023

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

Uncertainty quantification has been extensively used as a means to achie...
research
05/08/2021

Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Model-free deep reinforcement learning has achieved great success in man...
research
02/25/2020

Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration

Off-policy reinforcement learning (RL) is concerned with learning a rewa...
research
10/21/2021

Can Q-learning solve Multi Armed Bantids?

When a reinforcement learning (RL) method has to decide between several ...
research
01/06/2021

Geometric Entropic Exploration

Exploration is essential for solving complex Reinforcement Learning (RL)...

Please sign up or login with your details

Forgot password? Click here to reset