Recurrent Off-policy Baselines for Memory-based Continuous Control

10/25/2021
by   Zhihan Yang, et al.
0

When the environment is partially observable (PO), a deep reinforcement learning (RL) agent must learn a suitable temporal representation of the entire history in addition to a strategy to control. This problem is not novel, and there have been model-free and model-based algorithms proposed for this problem. However, inspired by recent success in model-free image-based RL, we noticed the absence of a model-free baseline for history-based RL that (1) uses full history and (2) incorporates recent advances in off-policy continuous control. Therefore, we implement recurrent versions of DDPG, TD3, and SAC (RDPG, RTD3, and RSAC) in this work, evaluate them on short-term and long-term PO domains, and investigate key design choices. Our experiments show that RDPG and RTD3 can surprisingly fail on some domains and that RSAC is the most reliable, reaching near-optimal performance on nearly all domains. However, one task that requires systematic exploration still proved to be difficult, even for RSAC. These results show that model-free RL can learn good temporal representation using only reward signals; the primary difficulty seems to be computational cost and exploration. To facilitate future research, we have made our PyTorch implementation publicly available at https://github.com/zhihanyang2022/off-policy-continuous-control.

READ FULL TEXT

page 5

page 7

page 9

page 14

research
10/11/2021

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Many problems in RL, such as meta RL, robust RL, and generalization in R...
research
08/31/2018

Directed Exploration in PAC Model-Free Reinforcement Learning

We study an exploration method for model-free RL that generalizes the co...
research
10/02/2019

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

Training an agent to solve control tasks directly from high-dimensional ...
research
08/11/2023

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

Function-as-a-Service (FaaS) introduces a lightweight, function-based cl...
research
08/22/2022

Event-Triggered Model Predictive Control with Deep Reinforcement Learning for Autonomous Driving

Event-triggered model predictive control (eMPC) is a popular optimal con...
research
07/21/2021

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

We present Bayesian Controller Fusion (BCF): a hybrid control strategy t...
research
04/07/2022

Temporal Alignment for History Representation in Reinforcement Learning

Environments in Reinforcement Learning are usually only partially observ...

Please sign up or login with your details

Forgot password? Click here to reset