Compositional Structure Learning for Action Understanding

10/21/2014
by   Ran Xu, et al.
0

The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection. In this paper, we propose a compositional model that leverages a new mid-level representation called compositional trajectories and a locally articulated spatiotemporal deformable parts model (LALSDPM) for fully action understanding. Our methods is advantageous in capturing the variable structure of dynamic human activity over a long range. First, the compositional trajectories capture long-ranging, frequently co-occurring groups of trajectories in space time and represent them in discriminative hierarchies, where human motion is largely separated from camera motion; second, LASTDPM learns a structured model with multi-layer deformable parts to capture multiple levels of articulated motion. We implement our methods and demonstrate state of the art performance on all three problems: action detection, localization, and recognition.

READ FULL TEXT

page 3

page 10

research
12/11/2018

Learning Discriminative Motion Features Through Detection

Despite huge success in the image domain, modern detection models such a...
research
04/16/2021

Spatiotemporal Deformable Models for Long-Term Complex Activity Detection

Long-term complex activity recognition and localisation can be crucial f...
research
07/03/2019

Deformable Tube Network for Action Detection in Videos

We address the problem of spatio-temporal action detection in videos. Ex...
research
04/03/2017

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

General human action recognition requires understanding of various visua...
research
02/01/2015

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

Action recognition is an important problem in multimedia understanding. ...
research
04/20/2023

SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

Our goal is to synthesize 3D human motions given textual inputs describi...

Please sign up or login with your details

Forgot password? Click here to reset