Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

10/11/2021
by   Tianwei Ni, et al.
0

Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs. This paper revisits this claim. We find that careful architecture and hyperparameter decisions yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques in their respective domains. We also release a simple and efficient implementation of recurrent model-free RL for future work to use as a baseline for POMDPs. Code is available at https://github.com/twni2016/pomdp-baselines

READ FULL TEXT

page 7

page 8

page 20

page 23

research
10/25/2021

Recurrent Off-policy Baselines for Memory-based Continuous Control

When the environment is partially observable (PO), a deep reinforcement ...
research
07/20/2021

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

We present DrQ-v2, a model-free reinforcement learning (RL) algorithm fo...
research
11/30/2021

Model-Free μ Synthesis via Adversarial Reinforcement Learning

Motivated by the recent empirical success of policy-based reinforcement ...
research
03/23/2020

Do recent advancements in model-based deep reinforcement learning really improve data efficiency?

Reinforcement learning (RL) has seen great advancements in the past few ...
research
03/05/2021

Model-free two-step design for improving transient learning performance in nonlinear optimal regulator problems

Reinforcement learning (RL) provides a model-free approach to designing ...
research
11/15/2022

Model free Shapley values for high dimensional data

A model-agnostic variable importance method can be used with arbitrary p...
research
08/11/2023

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

Function-as-a-Service (FaaS) introduces a lightweight, function-based cl...

Please sign up or login with your details

Forgot password? Click here to reset