Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving

09/11/2023
by   Ali Keysan, et al.
0

In autonomous driving tasks, scene understanding is the first step towards predicting the future behavior of the surrounding traffic participants. Yet, how to represent a given scene and extract its features are still open research questions. In this study, we propose a novel text-based representation of traffic scenes and process it with a pre-trained language encoder. First, we show that text-based representations, combined with classical rasterized image representations, lead to descriptive scene embeddings. Second, we benchmark our predictions on the nuScenes dataset and show significant improvements compared to baselines. Third, we show in an ablation study that a joint encoder of text and rasterized images outperforms the individual encoders confirming that both representations have their complementary strengths.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2022

Relation-based Motion Prediction using Traffic Scene Graphs

Representing relevant information of a traffic scene and understanding i...
research
06/02/2022

StopNet: Scalable Trajectory and Occupancy Prediction for Urban Autonomous Driving

We introduce a motion forecasting (behavior prediction) method that meet...
research
09/07/2023

PBP: Path-based Trajectory Prediction for Autonomous Driving

Trajectory prediction plays a crucial role in the autonomous driving sta...
research
06/24/2021

Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model

TextVQA requires models to read and reason about text in images to answe...
research
06/02/2021

Coverage-based Scene Fuzzing for Virtual Autonomous Driving Testing

Simulation-based virtual testing has become an essential step to ensure ...
research
02/02/2019

Hierarchical Photo-Scene Encoder for Album Storytelling

In this paper, we propose a novel model with a hierarchical photo-scene ...
research
07/04/2022

LaTeRF: Label and Text Driven Object Radiance Fields

Obtaining 3D object representations is important for creating photo-real...

Please sign up or login with your details

Forgot password? Click here to reset