DeepAI AI Chat
Log In Sign Up

Is attention to bounding boxes all you need for pedestrian action prediction?

by   Lina Achaji, et al.

The human driver is no longer the only one concerned with the complexity of the driving scenarios. Autonomous vehicles (AV) are similarly becoming involved in the process. Nowadays, the development of AV in urban places underpins essential safety concerns for vulnerable road users (VRUs) such as pedestrians. Therefore, to make the roads safer, it is critical to classify and predict their future behavior. In this paper, we present a framework based on multiple variations of the Transformer models to reason attentively about the dynamic evolution of the pedestrians' past trajectory and predict its future actions of crossing or not crossing the street. We proved that using only bounding boxes as input to our model can outperform the previous state-of-the-art models and reach a prediction accuracy of 91 up to two seconds ahead in the future. In addition, we introduced a large-size simulated dataset (CP2A) using CARLA for action prediction. Our model has similarly reached high accuracy (91 Interestingly, we showed that pre-training our Transformer model on the simulated dataset and then fine-tuning it on the real dataset can be very effective for the action prediction task.


Looking Ahead: Anticipating Pedestrians Crossing with Future Frames Prediction

In this paper, we present an end-to-end future-prediction model that foc...

Pedestrian 3D Bounding Box Prediction

Safety is still the main issue of autonomous driving, and in order to be...

Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs

One of the major challenges for autonomous vehicles in urban environment...

Predicting Action Tubes

In this work, we present a method to predict an entire `action tube' (a ...

VRUNet: Multi-Task Learning Model for Intent Prediction of Vulnerable Road Users

Advanced perception and path planning are at the core for any self-drivi...

Analysis over vision-based models for pedestrian action anticipation

Anticipating human actions in front of autonomous vehicles is a challeng...

TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction

Human intention prediction is a growing area of research where an activi...