TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction

10/26/2022
by   Nada Osman, et al.
0

Human intention prediction is a growing area of research where an activity in a video has to be anticipated by a vision-based system. To this end, the model creates a representation of the past, and subsequently, it produces future hypotheses about upcoming scenarios. In this work, we focus on pedestrians' early intention prediction in which, from a current observation of an urban scene, the model predicts the future activity of pedestrians that approach the street. Our method is based on a multi-modal transformer that encodes past observations and produces multiple predictions at different anticipation times. Moreover, we propose to learn the attention masks of our transformer-based model (Temporal Adaptive Mask Transformer) in order to weigh differently present and past temporal dependencies. We investigate our method on several public benchmarks for early intention prediction, improving the prediction performances at different anticipation times compared to the previous works.

READ FULL TEXT

page 2

page 4

research
10/20/2020

Pedestrian Intention Prediction: A Multi-task Perspective

In order to be globally deployed, autonomous cars must guarantee the saf...
research
05/05/2023

Distilled Mid-Fusion Transformer Networks for Multi-Modal Human Activity Recognition

Human Activity Recognition is an important task in many human-computer c...
research
10/23/2018

Action and intention recognition of pedestrians in urban traffic

Action and intention recognition of pedestrians in urban settings are ch...
research
07/21/2020

Multi-modal Transformer for Video Retrieval

The task of retrieving video content relevant to natural language querie...
research
07/30/2023

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Predicting human motion plays a crucial role in ensuring a safe and effe...
research
07/16/2021

Is attention to bounding boxes all you need for pedestrian action prediction?

The human driver is no longer the only one concerned with the complexity...

Please sign up or login with your details

Forgot password? Click here to reset