Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments

09/14/2021
by   Eshagh Kargar, et al.
0

Driving in a complex urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViT's attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.

READ FULL TEXT

page 1

page 5

page 6

research
12/22/2021

Evaluating the Robustness of Deep Reinforcement Learning for Autonomous and Adversarial Policies in a Multi-agent Urban Driving Environment

Deep reinforcement learning is actively used for training autonomous dri...
research
03/26/2021

Increasing the Efficiency of Policy Learning for Autonomous Vehicles by Multi-Task Representation Learning

Driving in a dynamic, multi-agent, and complex urban environment is a di...
research
03/02/2020

Efficient Latent Representations using Multiple Tasks for Autonomous Driving

Driving in the dynamic, multi-agent, and complex urban environment is a ...
research
07/19/2019

An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments

In reinforcement learning algorithms, leveraging multiple views of the e...
research
08/17/2020

MIDAS: Multi-agent Interaction-aware Decision-making with Adaptive Strategies for Urban Autonomous Navigation

Autonomous navigation in crowded, complex urban environments requires in...
research
05/10/2019

Attention-based Deep Reinforcement Learning for Multi-view Environments

In reinforcement learning algorithms, it is a common practice to account...
research
06/15/2023

Neural World Models for Computer Vision

Humans navigate in their environment by learning a mental model of the w...

Please sign up or login with your details

Forgot password? Click here to reset