History Compression via Language Models in Reinforcement Learning

05/24/2022
by   Fabian Paischer, et al.
5

In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with original token embeddings. To form these associations, a modern Hopfield network stores the original token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results. Our code is available at https://github.com/ml-jku/helm.

READ FULL TEXT

page 6

page 8

page 17

page 19

page 21

page 22

page 29

page 30

research
04/07/2022

Temporal Alignment for History Representation in Reinforcement Learning

Environments in Reinforcement Learning are usually only partially observ...
research
06/02/2022

Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

Real-world reinforcement learning tasks often involve some form of parti...
research
06/06/2021

Transformer in Convolutional Neural Networks

We tackle the low-efficiency flaw of vision transformer caused by the hi...
research
10/13/2022

Parameter-Efficient Masking Networks

A deeper network structure generally handles more complicated non-linear...
research
03/30/2022

A Fast Transformer-based General-Purpose Lossless Compressor

Deep-learning-based compressor has received interests recently due to mu...
research
06/05/2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

With the success of language pretraining, it is highly desirable to deve...
research
11/02/2019

An Algorithm for Routing Capsules in All Domains

Building on recent work on capsule networks, we propose a new form of "r...

Please sign up or login with your details

Forgot password? Click here to reset