Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition

12/13/2018
by   Hao Huang, et al.
4

Video action recognition, as a critical problem towards video understanding, has attracted increasing attention recently. To identify an action involving higher-order object interactions, we need to consider: 1) spatial relations among objects in a single frame; 2) temporal relations between different/same objects across multiple frames. However, previous approaches, e.g., 2D ConvNet + LSTM or 3D ConvNet, are either incapable of capturing relations between objects, or unable to handle streaming videos. In this paper, we propose a novel dynamic graph module to model object interactions in videos. We also devise two instantiations of our graph module: (i) visual graph, to capture visual similarity changes between objects; (ii) location graph, to capture relative location changes between objects. Distinct from previous models, the proposed graph module has the ability to process streaming videos in an aggressive manner. Combined with existing 3D action recognition ConvNets, our graph module can also boost ConvNets' performance, which demonstrates the flexibility of the module. We test our graph module on Something-Something dataset and achieve the state-of-the-art performance.

READ FULL TEXT
research
11/16/2017

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Human actions often involve complex interactions across several inter-re...
research
06/06/2021

Transformed ROIs for Capturing Visual Transformations in Videos

Modeling the visual changes that an action brings to a scene is critical...
research
12/14/2020

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Temporal relational modeling in video is essential for human action unde...
research
01/15/2021

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

We propose a novel approach to few-shot action recognition, finding temp...
research
06/05/2018

Videos as Space-Time Region Graphs

How do humans recognize the action "opening a book" ? We argue that ther...
research
06/15/2015

Slow and steady feature analysis: higher order temporal coherence in video

How can unlabeled video augment visual learning? Existing methods perfor...
research
05/09/2023

Group Activity Recognition via Dynamic Composition and Interaction

Previous group activity recognition approaches were limited to reasoning...

Please sign up or login with your details

Forgot password? Click here to reset