Temporal Relational Modeling with Self-Supervision for Action Segmentation

12/14/2020
by   Dong Wang, et al.
0

Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at https://github.com/redwang/DTGRM.

READ FULL TEXT

page 3

page 7

research
11/22/2017

Temporal Relational Reasoning in Videos

Temporal relational reasoning, the ability to link meaningful transforma...
research
10/19/2022

Temporal Action Segmentation: An Analysis of Modern Technique

Temporal action segmentation from videos aims at the dense labeling of v...
research
04/08/2019

Relational Action Forecasting

This paper focuses on multi-person action forecasting in videos. More pr...
research
11/22/2022

A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification

Action spotting in soccer videos is the task of identifying the specific...
research
01/21/2021

Activity Graph Transformer for Temporal Action Localization

We introduce Activity Graph Transformer, an end-to-end learnable model f...
research
12/13/2018

Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition

Video action recognition, as a critical problem towards video understand...
research
04/04/2023

DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation

Fully supervised action segmentation works on frame-wise action recognit...

Please sign up or login with your details

Forgot password? Click here to reset