Learning Higher-order Object Interactions for Keypoint-based Video Understanding

05/16/2023
by   Yi Huang, et al.
0

Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation, and may capture scene context using various modalities that further increases compute costs. Efficient methods such as those used for AR/VR often only use human-keypoint information but suffer from a loss of scene context that hurts accuracy. In this paper, we describe an action-localization method, KeyNet, that uses only the keypoint data for tracking and action recognition. Specifically, KeyNet introduces the use of object based keypoint information to capture context in the scene. Our method illustrates how to build a structured intermediate representation that allows modeling higher-order interactions in the scene from object and human keypoints without using any RGB information. We find that KeyNet is able to track and classify human actions at just 5 FPS. More importantly, we demonstrate that object keypoints can be modeled to recover any loss in context from using keypoint information over AVA action and Kinetics datasets.

READ FULL TEXT

page 1

page 5

research
11/16/2017

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Human actions often involve complex interactions across several inter-re...
research
06/11/2018

Massively Parallel Video Networks

We introduce a class of causal video understanding models that aims to i...
research
10/18/2016

Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences

Understanding continuous human actions is a non-trivial but important pr...
research
03/27/2023

Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling

This paper simultaneously addresses three limitations associated with co...
research
10/17/2020

Self-Selective Context for Interaction Recognition

Human-object interaction recognition aims for identifying the relationsh...
research
05/05/2022

Visually plausible human-object interaction capture from wearable sensors

In everyday lives, humans naturally modify the surrounding environment t...
research
06/19/2019

Unsupervised Learning of Object Structure and Dynamics from Videos

Extracting and predicting object structure and dynamics from videos with...

Please sign up or login with your details

Forgot password? Click here to reset