Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning

09/22/2022
by   Manuel Goulão, et al.
14

The Vision Transformer architecture has shown to be competitive in the computer vision (CV) space where it has dethroned convolution-based networks in several benchmarks. Nevertheless, Convolutional Neural Networks (CNN) remain the preferential architecture for the representation module in Reinforcement Learning. In this work, we study pretraining a Vision Transformer using several state-of-the-art self-supervised methods and assess data-efficiency gains from this training framework. We propose a new self-supervised learning method called TOV-VICReg that extends VICReg to better capture temporal relations between observations by adding a temporal order verification task. Furthermore, we evaluate the resultant encoders with Atari games in a sample-efficiency regime. Our results show that the vision transformer, when pretrained with TOV-VICReg, outperforms the other self-supervised methods but still struggles to overcome a CNN. Nevertheless, we were able to outperform a CNN in two of the ten games where we perform a 100k steps evaluation. Ultimately, we believe that such approaches in Deep Reinforcement Learning (DRL) might be the key to achieving new levels of performance as seen in natural language processing and computer vision. Source code will be available at: https://github.com/mgoulao/TOV-VICReg

READ FULL TEXT

page 8

page 9

research
05/11/2022

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Self-supervised learning (SSL) methods such as masked language modeling ...
research
04/11/2022

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

Vision Transformers (ViT) have recently demonstrated the significant pot...
research
03/13/2023

Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning

We propose universally slimmable self-supervised learning (dubbed as US3...
research
06/09/2021

Pretraining Representations for Data-Efficient Reinforcement Learning

Data efficiency is a key challenge for deep reinforcement learning. We a...
research
10/19/2020

D2RL: Deep Dense Architectures in Reinforcement Learning

While improvements in deep learning architectures have played a crucial ...
research
02/01/2022

Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

Much of recent Deep Reinforcement Learning success is owed to the neural...
research
07/30/2018

Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning

Self-supervised learning of convolutional neural networks can harness la...

Please sign up or login with your details

Forgot password? Click here to reset