Autoregressive Policies for Continuous Control Deep Reinforcement Learning

03/27/2019
by   Dmytro Korenkevych, et al.
0

Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian exploration however does not result in smooth trajectories that generally correspond to safe and rewarding behaviors in practical tasks. In addition, Gaussian policies do not result in an effective exploration of an environment and become increasingly inefficient as the action rate increases. This contributes to a low sample efficiency often observed in learning continuous control tasks. We introduce a family of stationary autoregressive (AR) stochastic processes to facilitate exploration in continuous control domains. We show that proposed processes possess two desirable features: subsequent process observations are temporally coherent with continuously adjustable degree of coherence, and the process stationary distribution is standard normal. We derive an autoregressive policy (ARP) that implements such processes maintaining the standard agent-environment interface. We show how ARPs can be easily used with the existing off-the-shelf learning algorithms. Empirically we demonstrate that using ARPs results in improved exploration and sample efficiency in both simulated and real world domains, and, furthermore, provides smooth exploration trajectories that enable safe operation of robotic hardware.

READ FULL TEXT
research
12/21/2018

NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

Reinforcement learning agents need exploratory behaviors to escape from ...
research
02/14/2018

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

In continuous action domains, standard deep reinforcement learning algor...
research
03/06/2019

Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation

An important facet of reinforcement learning (RL) has to do with how the...
research
02/04/2014

Safe Exploration of State and Action Spaces in Reinforcement Learning

In this paper, we consider the important problem of safe exploration in ...
research
05/08/2021

Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Model-free deep reinforcement learning has achieved great success in man...
research
05/30/2023

Temporally Layered Architecture for Efficient Continuous Control

We present a temporally layered architecture (TLA) for temporally adapti...
research
03/13/2018

Policy Search in Continuous Action Domains: an Overview

Continuous action policy search, the search for efficient policies in co...

Please sign up or login with your details

Forgot password? Click here to reset