Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

04/11/2022
by   Tianxin Tao, et al.
0

Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to standard convolutional neural network (CNN) architectures? To answer this question, we evaluate ViT training methods for image-based reinforcement learning (RL) control tasks and compare these results to a leading convolutional-network architecture method, RAD. For training the ViT encoder, we consider several recently-proposed self-supervised losses that are treated as auxiliary tasks, as well as a baseline with no additional loss terms. We find that the CNN architectures trained using RAD still generally provide superior performance. For the ViT methods, all three types of auxiliary tasks that we consider provide a benefit over plain ViT training. Furthermore, ViT masking-based tasks are found to significantly outperform ViT contrastive-learning.

READ FULL TEXT
research
09/22/2022

Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning

The Vision Transformer architecture has shown to be competitive in the c...
research
10/19/2020

D2RL: Deep Dense Architectures in Reinforcement Learning

While improvements in deep learning architectures have played a crucial ...
research
07/03/2022

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Off-policy reinforcement learning (RL) from pixel observations is notori...
research
06/30/2022

Deep Reinforcement Learning with Swin Transformer

Transformers are neural network models that utilize multiple layers of s...
research
08/09/2022

Object Detection with Deep Reinforcement Learning

Object localization has been a crucial task in computer vision field. Me...
research
10/15/2020

Masked Contrastive Representation Learning for Reinforcement Learning

Improving sample efficiency is a key research problem in reinforcement l...
research
11/13/2020

ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning

Current image-based reinforcement learning (RL) algorithms typically ope...

Please sign up or login with your details

Forgot password? Click here to reset