GAST-Net: Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video

03/11/2020
by   Junfa Liu, et al.
0

3D pose estimation in video can benefit greatly from both temporal and spatial information. Occlusions and depth ambiguities remain outstanding problems. In this work, we study how to learn the kinematic constraints of the human skeleton by modeling additional spatial information through attention and interleaving it in a synergistic way with temporal models. We contribute a graph attention spatio-temporal convolutional network (GAST-Net) that makes full use of spatio-temporal information and mitigates the problems of occlusion and depth ambiguities. We also contribute attention mechanisms that learn inter-joint relations that are easily visualizable. GAST-Net comprises of interleaved temporal convolutional and graph attention blocks. We use dilated temporal convolution networks (TCNs) to model long-term patterns. More critically, graph attention blocks encode local and global representations through novel convolutional kernels that express human skeletal symmetrical structure and adaptively extract global semantics over time. GAST-Net outperforms SOTA by approximately 10% for mean per-joint position error for ground-truth labels on Human3.6M and achieves competitive results on HumanEva-I.

READ FULL TEXT
research
09/15/2021

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

Graph Convolution Network (GCN) has been successfully used for 3D human ...
research
03/02/2022

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Recent transformer-based solutions have been introduced to estimate 3D h...
research
08/14/2020

Preterm Infants’ Pose Estimation With Spatio-Temporal Features

Objective: Preterm infants’ limb monitoring in neonatal intensive care u...
research
10/16/2022

A New Spatio-Temporal Loss Function for 3D Motion Reconstruction and Extended Temporal Metrics for Motion Evaluation

We propose a new loss function that we call Laplacian loss, based on spa...
research
08/12/2019

Enhanced 3D convolutional networks for crowd counting

Recently, convolutional neural networks (CNNs) are the leading defacto m...
research
08/29/2023

Spatio-temporal MLP-graph network for 3D human pose estimation

Graph convolutional networks and their variants have shown significant p...
research
06/09/2022

Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation

Egocentric 3D human pose estimation (HPE) from images is challenging due...

Please sign up or login with your details

Forgot password? Click here to reset