Versatile Multi-Modal Pre-Training for Human-Centric Perception

03/25/2022
by   Fangzhou Hong, et al.
5

Human-centric perception plays a vital role in vision and graphics. But their data annotations are prohibitively expensive. Therefore, it is desirable to have a versatile pre-train model that serves as a foundation for data-efficient downstream tasks transfer. To this end, we propose the Human-Centric Multi-Modal Contrastive Learning framework HCMoCo that leverages the multi-modal nature of human data (e.g. RGB, depth, 2D keypoints) for effective representation learning. The objective comes with two main challenges: dense pre-train for multi-modality data, efficient usage of sparse human priors. To tackle the challenges, we design the novel Dense Intra-sample Contrastive Learning and Sparse Structure-aware Contrastive Learning targets by hierarchically learning a modal-invariant latent space featured with continuous and ordinal feature distribution and structure-aware semantic consistency. HCMoCo provides pre-train for different modalities by combining heterogeneous datasets, which allows efficient usage of existing task-specific human data. Extensive experiments on four downstream tasks of different modalities demonstrate the effectiveness of HCMoCo, especially under data-efficient settings (7.16 Moreover, we demonstrate the versatility of HCMoCo by exploring cross-modality supervision and missing-modality inference, validating its strong ability in cross-modal association and reasoning.

READ FULL TEXT

page 5

page 18

page 19

page 20

research
05/22/2023

Connecting Multi-modal Contrastive Representations

Multi-modal Contrastive Representation (MCR) learning aims to encode dif...
research
05/24/2022

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

Multimodal learning from document data has achieved great success lately...
research
04/04/2022

MultiMAE: Multi-modal Multi-task Masked Autoencoders

We propose a pre-training strategy called Multi-modal Multi-task Masked ...
research
03/10/2023

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Contrastive loss has been increasingly used in learning representations ...
research
05/07/2020

COBRA: Contrastive Bi-Modal Representation Algorithm

There are a wide range of applications that involve multi-modal data, su...
research
03/09/2020

Multi-modal Self-Supervision from Generalized Data Transformations

Self-supervised learning has advanced rapidly, with several results beat...
research
03/22/2021

Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ_4Net

Recent advances in neuroscience have highlighted the effectiveness of mu...

Please sign up or login with your details

Forgot password? Click here to reset