Relaxed Transformer Decoders for Direct Action Proposal Generation

02/03/2021
by   Jiaqi Tang, et al.
0

Temporal action proposal generation is an important and challenging task in video understanding, which aims at detecting all temporal segments containing action instances of interest. The existing proposal generation approaches are generally based on pre-defined anchor windows or heuristic bottom-up boundary matching strategies. This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation, by re-purposing a Transformer-alike architecture. To tackle the essential visual difference between time and space, we make three important improvements over the original transformer detection framework (DETR). First, to deal with slowness prior in videos, we replace the original Transformer encoder with a boundary attentive module to better capture temporal information. Second, due to the ambiguous temporal boundary and relatively sparse annotations, we present a relaxed matching loss to relieve the strict criteria of single assignment to each groundtruth. Finally, we devise a three-branch head to further improve the proposal confidence estimation by explicitly predicting its completeness. Extensive experiments on THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of RTD-Net, on both tasks of temporal action proposal generation and temporal action detection. Moreover, due to its simplicity in design, our RTD-Net is more efficient than previous proposal generation methods without non-maximum suppression post-processing. The code will be available at <https://github.com/MCG-NJU/RTD-Action>.

READ FULL TEXT

page 3

page 8

page 9

research
09/17/2019

Deep Point-wise Prediction for Action Temporal Proposal

Detecting actions in videos is an important yet challenging task. Previo...
research
11/27/2022

Post-Processing Temporal Action Detection

Existing Temporal Action Detection (TAD) methods typically take a pre-pr...
research
03/30/2021

Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation

Temporal action proposal generation (TAPG) is a fundamental and challeng...
research
06/21/2022

Pyramid Region-based Slot Attention Network for Temporal Action Proposal Generation

It has been found that temporal action proposal generation, which aims t...
research
03/06/2023

Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary Generator

Temporal action localization in videos presents significant challenges i...
research
11/14/2019

CMSN: Continuous Multi-stage Network and Variable Margin Cosine Loss for Temporal Action Proposal Generation

Accurately locating the start and end time of an action in untrimmed vid...
research
10/16/2021

ASFormer: Transformer for Action Segmentation

Algorithms for the action segmentation task typically use temporal model...

Please sign up or login with your details

Forgot password? Click here to reset