Reinforcement Learning with Deep Energy-Based Policies

02/27/2017
by   Tuomas Haarnoja, et al.
0

We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.

READ FULL TEXT

page 7

page 8

research
07/20/2019

Potential-Based Advice for Stochastic Policy Learning

This paper augments the reward received by a reinforcement learning agen...
research
01/30/2021

Stay Alive with Many Options: A Reinforcement Learning Approach for Autonomous Navigation

Hierarchical reinforcement learning approaches learn policies based on h...
research
10/03/2022

Latent State Marginalization as a Low-cost Approach for Improving Exploration

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework...
research
08/17/2017

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

In this work, we propose to apply trust region optimization to deep rein...
research
03/11/2018

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive an optimal behavior that ac...
research
02/08/2022

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

Consider a walking agent that must adapt to damage. To approach this tas...
research
02/25/2020

Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration

Off-policy reinforcement learning (RL) is concerned with learning a rewa...

Please sign up or login with your details

Forgot password? Click here to reset