Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing

11/05/2021
by   Xuanhan Wang, et al.
0

Part-level Action Parsing aims at part state parsing for boosting action recognition in videos. Despite of dramatic progresses in the area of video classification research, a severe problem faced by the community is that the detailed understanding of human actions is ignored. Our motivation is that parsing human actions needs to build models that focus on the specific problem. We present a simple yet effective approach, named disentangled action parsing (DAP). Specifically, we divided the part-level action parsing into three stages: 1) person detection, where a person detector is adopted to detect all persons from videos as well as performs instance-level action recognition; 2) Part parsing, where a part-parsing model is proposed to recognize human parts from detected person images; and 3) Action parsing, where a multi-modal action parsing network is used to parse action category conditioning on all detection results that are obtained from previous stages. With these three major models applied, our approach of DAP records a global mean of 0.605 score in 2021 Kinetics-TPS Challenge.

READ FULL TEXT
research
10/07/2021

A Baseline Framework for Part-level Action Parsing and Action Recognition

This technical report introduces our 2nd place solution to Kinetics-TPS ...
research
03/09/2022

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

Action recognition from videos, i.e., classifying a video into one of th...
research
02/10/2016

DAP3D-Net: Where, What and How Actions Occur in Videos?

Action parsing in videos with complex scenes is an interesting but chall...
research
02/02/2015

An Expressive Deep Model for Human Action Parsing from A Single Image

This paper aims at one newly raising task in vision and multimedia resea...
research
07/16/2023

Integrating Human Parsing and Pose Network for Human Action Recognition

Human skeletons and RGB sequences are both widely-adopted input modaliti...
research
08/27/2022

RepParser: End-to-End Multiple Human Parsing with Representative Parts

Existing methods of multiple human parsing usually adopt a two-stage str...
research
05/20/2021

Egocentric Activity Recognition and Localization on a 3D Map

Given a video captured from a first person perspective and recorded in a...

Please sign up or login with your details

Forgot password? Click here to reset