Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet

07/09/2022
by   Shihao Zou, et al.
0

Multi-person pose understanding from RGB videos includes three complex tasks: pose estimation, tracking and motion forecasting. Among these three tasks, pose estimation and tracking are correlated, and tracking is crucial to motion forecasting. Most existing works either focus on a single task or employ cascaded methods to solve each individual task separately. In this paper, we propose Snipper, a framework to perform multi-person 3D pose estimation, tracking and motion forecasting simultaneously in a single inference. Specifically, we first propose a deformable attention mechanism to aggregate spatiotemporal information from video snippets. Building upon this deformable attention, a visual transformer is learned to encode the spatiotemporal features from multi-frame images and to decode informative pose features to update multi-person pose queries. Last, these queries are regressed to predict multi-person pose trajectories and future motions in one forward pass. In the experiments, we show the effectiveness of Snipper on three challenging public datasets where a generic model rivals specialized state-of-art baselines for pose estimation, tracking, and forecasting. Code is available at https://github.com/JimmyZou/Snipper

READ FULL TEXT

page 1

page 4

page 8

page 9

page 16

research
07/26/2019

Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image

Although significant improvement has been achieved in 3D human pose esti...
research
11/23/2016

PoseTrack: Joint Multi-Person Pose Estimation and Tracking

In this work, we introduce the challenging problem of joint multi-person...
research
03/15/2023

Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video

Temporal modeling is crucial for multi-frame human pose estimation. Most...
research
06/07/2019

Ego-Pose Estimation and Forecasting as Real-Time PD Control

We propose the use of a proportional-derivative (PD) control based polic...
research
02/09/2023

HybrIK-Transformer

HybrIK relies on a combination of analytical inverse kinematics and deep...
research
05/28/2021

TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation

Camera pose estimation or camera relocalization is the centerpiece in nu...
research
03/16/2023

Event-based Human Pose Tracking by Spiking Spatiotemporal Transformer

Event camera, as an emerging biologically-inspired vision sensor for cap...

Please sign up or login with your details

Forgot password? Click here to reset