Object Level Visual Reasoning in Videos

06/16/2018
by   Fabien Baradel, et al.
0

Human activity recognition is typically addressed by training models to detect key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity recognition require a level of understanding that pushes beyond this, requiring fine distinctions and a detailed comprehension of the interactions between actors and objects in a scene. We propose a model capable of learning to reason about semantically meaningful spatio-temporal interactions in videos. Key to our approach is the choice of performing this reasoning on an object level through the integration of state of the art object instance segmentation networks. This allows the model to learn detailed spatial interactions that exist at a semantic, object-interaction relevant level. We evaluated our method on three standard datasets: the Twenty-BN Something-Something dataset, the VLOG dataset and the EPIC Kitchens dataset, and achieve state of the art results on both. Finally, we also show visualizations of the interactions learned by the model, which illustrate object classes and their interactions corresponding to different activity classes.

READ FULL TEXT

page 2

page 8

page 16

research
04/19/2023

Automatic Interaction and Activity Recognition from Videos of Human Manual Demonstrations with Application to Anomaly Detection

This paper presents a new method to describe spatio-temporal relations b...
research
12/06/2022

Semantically Enhanced Global Reasoning for Semantic Segmentation

Recent advances in pixel-level tasks (e.g., segmentation) illustrate the...
research
03/08/2018

Analysis of Hand Segmentation in the Wild

A large number of works in egocentric vision have concentrated on action...
research
06/05/2015

Sentence Directed Video Object Codetection

We tackle the problem of video object codetection by leveraging the weak...
research
02/12/2015

Discovering Human Interactions in Videos with Limited Data Labeling

We present a novel approach for discovering human interactions in videos...
research
12/04/2018

Classifying Collisions with Spatio-Temporal Action Graph Networks

Events defined by the interaction of objects in a scene often are of cri...
research
11/22/2017

Temporal Relational Reasoning in Videos

Temporal relational reasoning, the ability to link meaningful transforma...

Please sign up or login with your details

Forgot password? Click here to reset