ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

11/17/2022
by   Jiayi Shao, et al.
0

In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2022. In this task, the goal is to retrieve and localize all instances of possible activities in egocentric videos. Ego4D dataset is challenging for the temporal action localization task as the temporal duration of the videos is quite long and each video contains multiple action instances with fine-grained action classes. To address these problems, we utilize a multi-scale transformer to classify different action categories and predict the boundary of each instance. Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism. Compared with directly feeding all video features to the transformer encoder, the proposed segment-level recurrence mechanism alleviates the optimization difficulties and achieves better performance. The final submission achieved Recall@1,tIoU=0.5 score of 37.24, average mAP score of 17.67 and took 3-rd place on the leaderboard.

READ FULL TEXT
research
01/09/2016

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

We address temporal action localization in untrimmed long videos. This i...
research
01/21/2021

Activity Graph Transformer for Temporal Action Localization

We introduce Activity Graph Transformer, an end-to-end learnable model f...
research
06/28/2019

Localizing Unseen Activities in Video via Image Query

Action localization in untrimmed videos is an important topic in the fie...
research
04/03/2022

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Counting repetitive actions are widely seen in human activities such as ...
research
08/16/2022

Temporal Action Localization with Multi-temporal Scales

Temporal action localization plays an important role in video analysis, ...
research
05/12/2022

Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos

Language-driven action localization in videos is a challenging task that...
research
05/25/2023

Action Sensitivity Learning for Temporal Action Localization

Temporal action localization (TAL), which involves recognizing and locat...

Please sign up or login with your details

Forgot password? Click here to reset