PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

10/20/2022
by   Jing Tan, et al.
0

Traditional temporal action detection (TAD) usually handles untrimmed videos with small number of action instances from a single label (e.g., ActivityNet, THUMOS). However, this setting might be unrealistic as different classes of actions often co-occur in practice. In this paper, we focus on the task of multi-label temporal action detection that aims to localize all action instances from a multi-label untrimmed video. Multi-label TAD is more challenging as it requires for fine-grained class discrimination within a single video and precise localization of the co-occurring instances. To mitigate this issue, we extend the sparse query-based detection paradigm from the traditional TAD and propose the multi-label TAD framework of PointTAD. Specifically, our PointTAD introduces a small set of learnable query points to represent the important frames of each action instance. This point-based representation provides a flexible mechanism to localize the discriminative frames at boundaries and as well the important frames inside the action. Moreover, we perform the action decoding process with the Multi-level Interactive Module to capture both point-level and instance-level action semantics. Finally, our PointTAD employs an end-to-end trainable framework simply based on RGB input for easy deployment. We evaluate our proposed method on two popular benchmarks and introduce the new metric of detection-mAP for multi-label TAD. Our model outperforms all previous methods by a large margin under the detection-mAP metric, and also achieves promising results under the segmentation-mAP metric. Code is available at https://github.com/MCG-NJU/PointTAD.

READ FULL TEXT

page 2

page 10

page 16

research
11/08/2022

SimOn: A Simple Framework for Online Temporal Action Localization

Online Temporal Action Localization (On-TAL) aims to immediately provide...
research
03/04/2021

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Real-world videos contain many complex actions with inherent relationshi...
research
06/18/2021

End-to-end Temporal Action Detection with Transformer

Temporal action detection (TAD) aims to determine the semantic label and...
research
08/24/2023

HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation

Point-supervised Temporal Action Localization (PSTAL) is an emerging res...
research
09/05/2023

Multi-label affordance mapping from egocentric vision

Accurate affordance detection and segmentation with pixel precision is a...
research
01/21/2021

Activity Graph Transformer for Temporal Action Localization

We introduce Activity Graph Transformer, an end-to-end learnable model f...
research
08/17/2023

Eosinophils Instance Object Segmentation on Whole Slide Imaging Using Multi-label Circle Representation

Eosinophilic esophagitis (EoE) is a chronic and relapsing disease charac...

Please sign up or login with your details

Forgot password? Click here to reset