HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

03/10/2023
by   Shixiang Tang, et al.
0

Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to have a general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from the aspects of both benchmark and pretraining methods. Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting. To learn both coarse-grained and fine-grained knowledge in human bodies, we further propose a \textbf{P}rojector \textbf{A}ssis\textbf{T}ed \textbf{H}ierarchical pretraining method (\textbf{PATH}) to learn diverse knowledge at different granularity levels. Comprehensive evaluations on HumanBench show that our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets. The code will be publicly at \href{https://github.com/OpenGVLab/HumanBench}{https://github.com/OpenGVLab/HumanBench}.

READ FULL TEXT

page 2

page 16

page 17

page 18

research
03/06/2023

UniHCP: A Unified Model for Human-Centric Perceptions

Human-centric perceptions (e.g., pose estimation, human parsing, pedestr...
research
07/13/2023

Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks

Vision-language foundation models such as CLIP have shown impressive zer...
research
12/06/2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

The foundation models have recently shown excellent performance on a var...
research
12/08/2020

LAMP: Label Augmented Multimodal Pretraining

Multi-modal representation learning by pretraining has become an increas...
research
02/25/2022

OCR-IDL: OCR Annotations for Industry Document Library Dataset

Pretraining has proven successful in Document Intelligence tasks where d...
research
07/29/2023

Effective Whole-body Pose Estimation with Two-stages Distillation

Whole-body pose estimation localizes the human body, hand, face, and foo...
research
03/14/2023

Diversity-Aware Meta Visual Prompting

We present Diversity-Aware Meta Visual Prompting (DAM-VP), an efficient ...

Please sign up or login with your details

Forgot password? Click here to reset