Efficient Human Pose Estimation by Learning Deeply Aggregated Representations

12/13/2020
by   Zhengxiong Luo, et al.
0

In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale information mainly from features with different spatial sizes. Powerful multi-scale representations usually rely on the cascaded pyramid framework. This framework largely boosts the performance but in the meanwhile makes networks very deep and complex. Instead, we focus on exploiting multi-scale information from layers with different receptive-field sizes and then making full of use this information by improving the fusion method. Specifically, we propose an orthogonal attention block (OAB) and a second-order fusion unit (SFU). The OAB learns multi-scale information from different layers and enhances them by encouraging them to be diverse. The SFU adaptively selects and fuses diverse multi-scale information and suppress the redundant ones. This could maximize the effective information in final fused representations. With the help of OAB and SFU, our single pyramid network may be able to generate deeply aggregated representations that contain even richer multi-scale information and have a larger representing capacity than that of cascaded networks. Thus, our networks could achieve comparable or even better accuracy with much smaller model complexity. Specifically, our achieves 70.5 in AP score on COCO test-dev set with only 1.0G FLOPs. Its speed on a CPU platform achieves 58 Persons-Per-Second (PPS).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2021

Efficient Human Pose Estimation by Maximizing Fusion and High-Level Spatial Attention

In this paper, we propose an efficient human pose estimation network – S...
research
03/18/2021

OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation

We propose OmniPose, a single-pass, end-to-end trainable framework, that...
research
06/01/2021

Full-Resolution Encoder-Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation

To achieve more accurate 2D human pose estimation, we extend the success...
research
07/22/2021

Adaptive Dilated Convolution For Human Pose Estimation

Most existing human pose estimation (HPE) methods exploit multi-scale in...
research
12/20/2021

BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations

We propose BAPose, a novel bottom-up approach that achieves state-of-the...
research
11/30/2020

ScaleNAS: One-Shot Learning of Scale-Aware Representations for Visual Recognition

Scale variance among different sizes of body parts and objects is a chal...
research
12/01/2020

Dynamic Feature Pyramid Networks for Object Detection

This paper studies feature pyramid network (FPN), which is a widely used...

Please sign up or login with your details

Forgot password? Click here to reset