Activity Graph Transformer for Temporal Action Localization

01/21/2021
by   Megha Nawhal, et al.
3

We introduce Activity Graph Transformer, an end-to-end learnable model for temporal action localization, that receives a video as input and directly predicts a set of action instances that appear in the video. Detecting and localizing action instances in untrimmed videos requires reasoning over multiple action instances in a video. The dominant paradigms in the literature process videos temporally to either propose action regions or directly produce frame-level detections. However, sequential processing of videos is problematic when the action instances have non-sequential dependencies and/or non-linear temporal ordering, such as overlapping action instances or re-occurrence of action instances over the course of the video. In this work, we capture this non-linear temporal structure by reasoning over the videos as non-sequential entities in the form of graphs. We evaluate our model on challenging datasets: THUMOS14, Charades, and EPIC-Kitchens-100. Our results show that our proposed model outperforms the state-of-the-art by a considerable margin.

READ FULL TEXT

page 1

page 8

page 15

page 17

research
11/17/2022

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment...
research
12/14/2020

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Temporal relational modeling in video is essential for human action unde...
research
11/08/2022

SimOn: A Simple Framework for Online Temporal Action Localization

Online Temporal Action Localization (On-TAL) aims to immediately provide...
research
07/07/2016

Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge

Current state-of-the-art human activity recognition is focused on the cl...
research
10/20/2022

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

Traditional temporal action detection (TAD) usually handles untrimmed vi...
research
06/20/2019

vireoJD-MM at Activity Detection in Extended Videos

This notebook paper presents an overview and comparative analysis of our...
research
12/19/2016

Asynchronous Temporal Fields for Action Recognition

Actions are more than just movements and trajectories: we cook to eat an...

Please sign up or login with your details

Forgot password? Click here to reset