DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation

03/16/2022
by   Ailing Zeng, et al.
6

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10 poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation and body mesh recovery tasks with four datasets validate the efficiency and effectiveness of DeciWatch.

READ FULL TEXT
research
11/04/2020

Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video

The effectiveness of the approaches to predict 3D poses from 2D poses es...
research
12/27/2021

SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

When analyzing human motion videos, the output jitters from existing pos...
research
07/12/2022

Learning to Estimate External Forces of Human Motion in Video

Analyzing sports performance or preventing injuries requires capturing g...
research
08/06/2022

IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation

Video 3D human pose estimation aims to localize the 3D coordinates of hu...
research
09/15/2022

A Temporal Densely Connected Recurrent Network for Event-based Human Pose Estimation

Event camera is an emerging bio-inspired vision sensors that report per-...
research
08/24/2019

Dynamic Kernel Distillation for Efficient Pose Estimation in Videos

Existing video-based human pose estimation methods extensively apply lar...
research
08/18/2023

ResQ: Residual Quantization for Video Perception

This paper accelerates video perception, such as semantic segmentation a...

Please sign up or login with your details

Forgot password? Click here to reset