IntFormer: Predicting pedestrian intention with the aid of the Transformer architecture
Understanding pedestrian crossing behavior is an essential goal in intelligent vehicle development, leading to an improvement in their security and traffic flow. In this paper, we developed a method called IntFormer. It is based on transformer architecture and a novel convolutional video classification model called RubiksNet. Following the evaluation procedure in a recent benchmark, we show that our model reaches state-of-the-art results with good performance (≈ 40 seq. per second) and size (8×smaller than the best performing model), making it suitable for real-time usage. We also explore each of the input features, finding that ego-vehicle speed is the most important variable, possibly due to the similarity in crossing cases in PIE dataset.
READ FULL TEXT