Learning from 2D: Pixel-to-Point Knowledge Transfer for 3D Pretraining

04/10/2021
by   Yueh-Cheng Liu, et al.
0

Most of the 3D networks are trained from scratch owning to the lack of large-scale labeled datasets. In this paper, we present a novel 3D pretraining method by leveraging 2D networks learned from rich 2D datasets. We propose the pixel-to-point knowledge transfer to effectively utilize the 2D information by mapping the pixel-level and point-level features into the same embedding space. Due to the heterogeneous nature between 2D and 3D networks, we introduce the back-projection function to align the features between 2D and 3D to make the transfer possible. Additionally, we devise an upsampling feature projection layer to increase the spatial resolution of high-level 2D feature maps, which helps learning fine-grained 3D representations. With a pretrained 2D network, the proposed pretraining process requires no additional 2D or 3D labeled data, further alleviating the expansive 3D data annotation cost. To the best of our knowledge, we are the first to exploit existing 2D trained weights to pretrain 3D deep neural networks. Our intensive experiments show that the 3D models pretrained with 2D knowledge boost the performances across various real-world 3D downstream tasks.

READ FULL TEXT

page 2

page 3

research
09/21/2023

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Current speaker recognition systems primarily rely on supervised approac...
research
11/30/2021

CRIS: CLIP-Driven Referring Image Segmentation

Referring image segmentation aims to segment a referent via a natural li...
research
09/30/2021

Compositional generalization in semantic parsing with pretrained transformers

Large-scale pretraining instills large amounts of knowledge in deep neur...
research
06/08/2021

Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

Collecting annotated data for semantic segmentation is time-consuming an...
research
04/15/2021

Generating Datasets with Pretrained Language Models

To obtain high-quality sentence embeddings from pretrained language mode...
research
09/24/2019

Pretraining boosts out-of-domain robustness for pose estimation

Deep neural networks are highly effective tools for human and animal pos...
research
12/24/2020

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

Self-supervised representation learning is a critical problem in compute...

Please sign up or login with your details

Forgot password? Click here to reset