Perception-Prediction-Reaction Agents for Deep Reinforcement Learning

06/26/2020
by   Adam Stooke, et al.
5

We introduce a new recurrent agent architecture and associated auxiliary losses which improve reinforcement learning in partially observable tasks requiring long-term memory. We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asymmetry. The reaction core incorporates new observations with input from the slow core to produce the agent's policy; the perception core accesses only short-term observations and informs the slow core; lastly, the prediction core accesses only long-term memory. An auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory. We present the resulting Perception-Prediction-Reaction (PPR) agent and demonstrate its improved performance over a strong LSTM-agent baseline in DMLab-30, particularly in tasks requiring long-term memory. We further show significant improvements in Capture the Flag, an environment requiring agents to acquire a complicated mixture of skills over long time scales. In a series of ablation experiments, we probe the importance of each component of the PPR agent, establishing that the entire, novel combination is necessary for this intriguing result.

READ FULL TEXT
research
10/24/2022

Evaluating Long-Term Memory in 3D Mazes

Intelligent agents need to remember salient information to reason in par...
research
08/20/2022

MemoNav: Selecting Informative Memories for Visual Navigation

Image-goal navigation is a challenging task, as it requires the agent to...
research
09/10/2015

Recurrent Reinforcement Learning: A Hybrid Approach

Successful applications of reinforcement learning in real-world problems...
research
10/25/2021

Learning What to Memorize: Using Intrinsic Motivation to Form Useful Memory in Partially Observable Reinforcement Learning

Reinforcement Learning faces an important challenge in partial observabl...
research
11/18/2019

Influence-aware Memory for Deep Reinforcement Learning

Making the right decisions when some of the state variables are hidden, ...
research
03/09/2019

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

Many robotic applications require the agent to perform long-horizon task...
research
09/03/2020

Grounded Language Learning Fast and Slow

Recent work has shown that large text-based neural language models, trai...

Please sign up or login with your details

Forgot password? Click here to reset