DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation

04/04/2023
by   Peiyao Wang, et al.
0

Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue. Existing works have proposed a variety of solutions such as boundary-aware networks, multi-stage refinement, and temporal smoothness losses. However, most of them take advantage of frame-wise supervision, which cannot effectively tackle the evaluation metrics with different granularities. In this paper, for the desirable large receptive field, we first develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention. Then we decouple two inherent goals in action segmentation, ie, (1) individual identification solved by frame-wise supervision, and (2) temporal reasoning tackled by action set prediction. Afterward, an action alignment module fuses these different granularity predictions, leading to more accurate and smoother action segmentation. We achieve state-of-the-art accuracy, eg, 82.8 on Breakfast, which demonstrates the effectiveness of our proposed method, accompanied by extensive ablation studies. The code will be made available later.

READ FULL TEXT

page 8

page 10

research
03/11/2021

Temporal Action Segmentation from Timestamp Supervision

Temporal action segmentation approaches have been very successful recent...
research
09/12/2023

OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation

Temporal action segmentation is typically achieved by discovering the dr...
research
12/14/2020

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Temporal relational modeling in video is essential for human action unde...
research
12/15/2020

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

Point-Level temporal action localization (PTAL) aims to localize actions...
research
04/07/2020

Temporal Pyramid Network for Action Recognition

Visual tempo characterizes the dynamics and the temporal scale of an act...
research
03/31/2023

Diffusion Action Segmentation

Temporal action segmentation is crucial for understanding long-form vide...
research
01/04/2021

Global2Local: Efficient Structure Search for Video Action Segmentation

Temporal receptive fields of models play an important role in action seg...

Please sign up or login with your details

Forgot password? Click here to reset