End-to-End Spatio-Temporal Action Localisation with Video Transformers

04/24/2023
by   Alexey Gritsenko, et al.
0

The most performant spatio-temporal action localisation models use external person proposals and complex external memory banks. We propose a fully end-to-end, purely-transformer based model that directly ingests an input video, and outputs tubelets – a sequence of bounding boxes and the action classes at each frame. Our flexible model can be trained with either sparse bounding-box supervision on individual frames, or full tubelet annotations. And in both cases, it predicts coherent tubelets as the output. Moreover, our end-to-end model requires no additional pre-processing in the form of proposals, or post-processing in terms of non-maximal suppression. We perform extensive ablation experiments, and significantly advance the state-of-the-art results on four different spatio-temporal action localisation benchmarks with both sparse keyframes and full tubelet annotations.

READ FULL TEXT

page 1

page 8

research
04/26/2016

Spot On: Action Localization from Pointly-Supervised Proposals

We strive for spatio-temporal localization of actions in videos. The sta...
research
04/01/2020

Spatio-Temporal Action Detection with Multi-Object Interaction

Spatio-temporal action detection in videos requires localizing the actio...
research
05/29/2018

Pointly-Supervised Action Localization

This paper strives for spatio-temporal localization of human actions in ...
research
05/23/2020

RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images

Recent methods for people detection in overhead, fisheye images either u...
research
06/29/2018

A flexible model for training action localization with varying levels of supervision

Spatio-temporal action detection in videos is typically addressed in a f...
research
10/30/2021

A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies

As one of the challenging problems in video search, Person-Action Instan...
research
06/15/2021

Relation Modeling in Spatio-Temporal Action Localization

This paper presents our solution to the AVA-Kinetics Crossover Challenge...

Please sign up or login with your details

Forgot password? Click here to reset