Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video

03/15/2023
by   Runyang Feng, et al.
0

Temporal modeling is crucial for multi-frame human pose estimation. Most existing methods directly employ optical flow or deformable convolution to predict full-spectrum motion fields, which might incur numerous irrelevant cues, such as a nearby person or background. Without further efforts to excavate meaningful motion priors, their results are suboptimal, especially in complicated spatiotemporal interactions. On the other hand, the temporal difference has the ability to encode representative motion information which can potentially be valuable for pose estimation but has not been fully exploited. In this paper, we present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts and engages mutual information objectively to facilitate useful motion information disentanglement. To be specific, we design a multi-stage Temporal Difference Encoder that performs incremental cascaded learning conditioned on multi-stage feature difference sequences to derive informative motion representation. We further propose a Representation Disentanglement module from the mutual information perspective, which can grasp discriminative task-relevant motion signals by explicitly defining useful and noisy constituents of the raw motion features and minimizing their mutual information. These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark dataset HiEve, and achieve state-of-the-art performance on three benchmarks PoseTrack2017, PoseTrack2018, and PoseTrack21.

READ FULL TEXT

page 1

page 4

page 7

page 8

research
03/29/2022

Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation

Multi-frame human pose estimation has long been a compelling and fundame...
research
03/12/2021

Deep Dual Consecutive Network for Human Pose Estimation

Multi-frame human pose estimation in complicated situations is challengi...
research
11/17/2018

Explicit Pose Deformation Learning for Tracking Human Poses

We present a method for human pose tracking that learns explicitly about...
research
07/09/2022

Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet

Multi-person pose understanding from RGB videos includes three complex t...
research
09/18/2023

Sparse and Privacy-enhanced Representation for Human Pose Estimation

We propose a sparse and privacy-enhanced representation for Human Pose E...
research
08/18/2023

ResQ: Residual Quantization for Video Perception

This paper accelerates video perception, such as semantic segmentation a...
research
07/22/2022

Learning Human Kinematics by Modeling Temporal Correlations between Joints for Video-based Human Pose Estimation

Estimating human poses from videos is critical in human-computer interac...

Please sign up or login with your details

Forgot password? Click here to reset