TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer

11/19/2022
by   Zhiyang Dou, et al.
0

In this paper, we introduce a set of effective TOken REduction (TORE) strategies for Transformer-based Human Mesh Recovery from monocular images. Current SOTA performance is achieved by Transformer-based structures. However, they suffer from high model complexity and computation cost caused by redundant tokens. We propose token reduction strategies based on two important aspects, i.e., the 3D geometry structure and 2D image feature, where we hierarchically recover the mesh geometry with priors from body structure and conduct token clustering to pass fewer but more discriminative image feature tokens to the Transformer. As a result, our method vastly reduces the number of tokens involved in high-complexity interactions in the Transformer, achieving competitive accuracy of shape recovery at a significantly reduced computational cost. We conduct extensive experiments across a wide range of benchmarks to validate the proposed method and further demonstrate the generalizability of our method on hand mesh recovery. Our code will be publicly available once the paper is published.

READ FULL TEXT

page 5

page 7

page 8

page 14

page 15

page 16

page 17

page 18

research
09/05/2023

Extract-and-Adaptation Network for 3D Interacting Hand Mesh Recovery

Understanding how two hands interact with each other is a key component ...
research
07/27/2022

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Transformer encoder architectures have recently achieved state-of-the-ar...
research
08/19/2022

Improved Image Classification with Token Fusion

In this paper, we propose a method using the fusion of CNN and transform...
research
11/24/2021

Self-slimmed Vision Transformer

Vision transformers (ViTs) have become the popular structures and outper...
research
09/11/2023

SparseSwin: Swin Transformer with Sparse Transformer Block

Advancements in computer vision research have put transformer architectu...
research
03/09/2023

Efficient Transformer-based 3D Object Detection with Dynamic Token Halting

Balancing efficiency and accuracy is a long-standing problem for deployi...
research
09/08/2023

Encoding Multi-Domain Scientific Papers by Ensembling Multiple CLS Tokens

Many useful tasks on scientific documents, such as topic classification ...

Please sign up or login with your details

Forgot password? Click here to reset