Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

06/02/2022
by   Kevin Esslinger, et al.
52

Real-world reinforcement learning tasks often involve some form of partial observability where the observations only give a partial or noisy view of the true state of the world. Such tasks typically require some form of memory, where the agent has access to multiple past observations, in order to perform well. One popular way to incorporate memory is by using a recurrent neural network to access the agent's history. However, recurrent neural networks in reinforcement learning are often fragile and difficult to train, susceptible to catastrophic forgetting and sometimes fail completely as a result. In this work, we propose Deep Transformer Q-Networks (DTQN), a novel architecture utilizing transformers and self-attention to encode an agent's history. DTQN is designed modularly, and we compare results against several modifications to our base model. Our experiments demonstrate the transformer can solve partially observable tasks faster and more stably than previous recurrent approaches.

READ FULL TEXT

page 6

page 18

research
06/06/2018

Deep Variational Reinforcement Learning for POMDPs

Many real-world sequential decision making problems are partially observ...
research
03/10/2021

Hard Attention Control By Mutual Information Maximization

Biological agents have adopted the principle of attention to limit the r...
research
10/31/2017

Regret Minimization for Partially Observable Deep Reinforcement Learning

Deep reinforcement learning algorithms that estimate state and state-act...
research
06/15/2023

Recurrent Memory Decision Transformer

Transformative models, originally developed for natural language problem...
research
03/13/2023

Transformer-based World Models Are Happy With 100k Interactions

Deep neural networks have been successful in many reinforcement learning...
research
05/24/2022

History Compression via Language Models in Reinforcement Learning

In a partially observable Markov decision process (POMDP), an agent typi...
research
11/18/2019

Influence-aware Memory for Deep Reinforcement Learning

Making the right decisions when some of the state variables are hidden, ...

Please sign up or login with your details

Forgot password? Click here to reset