DAP3D-Net: Where, What and How Actions Occur in Videos?

02/10/2016
by   Li Liu, et al.
0

Action parsing in videos with complex scenes is an interesting but challenging task in computer vision. In this paper, we propose a generic 3D convolutional neural network in a multi-task learning manner for effective Deep Action Parsing (DAP3D-Net) in videos. Particularly, in the training phase, action localization, classification and attributes learning can be jointly optimized on our appearancemotion data via DAP3D-Net. For an upcoming test video, we can describe each individual action in the video simultaneously as: Where the action occurs, What the action is and How the action is performed. To well demonstrate the effectiveness of the proposed DAP3D-Net, we also contribute a new Numerous-category Aligned Synthetic Action dataset, i.e., NASA, which consists of 200; 000 action clips of more than 300 categories and with 33 pre-defined action attributes in two hierarchical levels (i.e., low-level attributes of basic body part movements and high-level attributes related to action motion). We learn DAP3D-Net using the NASA dataset and then evaluate it on our collected Human Action Understanding (HAU) dataset. Experimental results show that our approach can accurately localize, categorize and describe multiple actions in realistic videos.

READ FULL TEXT

page 1

page 6

page 7

page 8

research
03/09/2022

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

Action recognition from videos, i.e., classifying a video into one of th...
research
11/05/2021

Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing

Part-level Action Parsing aims at part state parsing for boosting action...
research
02/15/2021

RMS-Net: Regression and Masking for Soccer Event Spotting

The recently proposed action spotting task consists in finding the exact...
research
12/02/2016

Unsupervised Human Action Detection by Action Matching

We propose a new task of unsupervised action detection by action matchin...
research
08/31/2020

Learning to Localize Actions from Moments

With the knowledge of action moments (i.e., trimmed video clips that eac...
research
05/20/2021

Egocentric Activity Recognition and Localization on a 3D Map

Given a video captured from a first person perspective and recorded in a...
research
07/03/2018

Long Activity Video Understanding using Functional Object-Oriented Network

Video understanding is one of the most challenging topics in computer vi...

Please sign up or login with your details

Forgot password? Click here to reset