PERF-Net: Pose Empowered RGB-Flow Net

09/28/2020
by   Yinxiao Li, et al.
1

In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state of the art performance. In this paper we show the benefits of including yet another stream based on human pose estimated from each frame – specifically by rendering pose on input RGB frames. At first blush, this additional stream may seem redundant given that human pose is fully determined by RGB pixel values – however we show (perhaps surprisingly) that this simple and flexible addition can provide complementary gains. Using this insight, we then propose a new model, which we dub PERF-Net (short for Pose Empowered RGB-Flow Net), which combines this new pose stream with the standard RGB and flow based input streams via distillation techniques and show that our model outperforms the state-of-the-art by a large margin in a number of human action recognition datasets while not requiring flow or pose to be explicitly computed at inference time.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
12/19/2018

D3D: Distilled 3D Networks for Video Action Recognition

State-of-the-art methods for video action recognition commonly use an en...
research
10/16/2020

Pose And Joint-Aware Action Recognition

Most human action recognition systems typically consider static appearan...
research
05/22/2018

Pose-Based Two-Stream Relational Networks for Action Recognition in Videos

Recently, pose-based action recognition has gained more and more attenti...
research
07/13/2020

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos

Most current action recognition methods heavily rely on appearance infor...
research
10/23/2019

Streaming Networks: Enable A Robust Classification of Noise-Corrupted Images

The convolution neural nets (conv nets) have achieved a state-of-the-art...
research
12/25/2018

Coupled Recurrent Network (CRN)

Many semantic video analysis tasks can benefit from multiple, heterogeno...
research
10/17/2021

TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

Most of existing video action recognition models ingest raw RGB frames. ...

Please sign up or login with your details

Forgot password? Click here to reset