AMPose: Alternatively Mixed Global-Local Attention Model for 3D Human Pose Estimation
The graph convolutional network (GCN) has been applied to 3D human pose estimation (HPE). In addition, the pure transformer model recently shows promising results in the video-based method. However, the single-frame method still needs to model the physically connected relations among joints because the feature representation transformed only by global attention lack the relationships of the human skeleton. To deal with this problem, we propose a novel architecture, namely AMPose, to combine the physically connected and global relations among joints in the human skeleton towards human pose estimation. The effectiveness of our proposed method is demonstrated through evaluation on Human3.6M dataset. Our model also shows better generalization ability by cross-dataset comparison on MPI-INF-3DHP.
READ FULL TEXT