DeepAI AI Chat
Log In Sign Up

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

04/11/2022
by   Tianxin Tao, et al.
0

Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to standard convolutional neural network (CNN) architectures? To answer this question, we evaluate ViT training methods for image-based reinforcement learning (RL) control tasks and compare these results to a leading convolutional-network architecture method, RAD. For training the ViT encoder, we consider several recently-proposed self-supervised losses that are treated as auxiliary tasks, as well as a baseline with no additional loss terms. We find that the CNN architectures trained using RAD still generally provide superior performance. For the ViT methods, all three types of auxiliary tasks that we consider provide a benefit over plain ViT training. Furthermore, ViT masking-based tasks are found to significantly outperform ViT contrastive-learning.

READ FULL TEXT
09/22/2022

Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning

The Vision Transformer architecture has shown to be competitive in the c...
10/19/2020

D2RL: Deep Dense Architectures in Reinforcement Learning

While improvements in deep learning architectures have played a crucial ...
07/03/2022

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Off-policy reinforcement learning (RL) from pixel observations is notori...
06/30/2022

Deep Reinforcement Learning with Swin Transformer

Transformers are neural network models that utilize multiple layers of s...
08/09/2022

Object Detection with Deep Reinforcement Learning

Object localization has been a crucial task in computer vision field. Me...
10/15/2020

Masked Contrastive Representation Learning for Reinforcement Learning

Improving sample efficiency is a key research problem in reinforcement l...
11/13/2020

ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning

Current image-based reinforcement learning (RL) algorithms typically ope...