P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

12/24/2020
by   Yunze Liu, et al.
2

Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks. A promising approach is to use contrastive learning to learn a latent space where features are close for similar data samples and far apart for dissimilar ones. This approach has demonstrated tremendous success for pretraining both image and point cloud feature extractors, but it has been barely investigated for multi-modal RGB-D scans, especially with the goal of facilitating high-level scene understanding. To solve this problem, we propose contrasting "pairs of point-pixel pairs", where positives include pairs of RGB-D points in correspondence, and negatives include pairs where one of the two modalities has been disturbed and/or the two RGB-D points are not in correspondence. This provides extra flexibility in making hard negatives and helps networks to learn features from both modalities, not just the more discriminating one of the two. Experiments show that this proposed approach yields better performance on three large-scale RGB-D scene understanding benchmarks (ScanNet, SUN RGB-D, and 3RScan) than previous pretraining approaches.

READ FULL TEXT

page 4

page 5

page 6

research
03/22/2023

CLIP^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

Contrastive Language-Image Pre-training, benefiting from large-scale unl...
research
02/13/2023

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

Current RGB-D scene recognition approaches often train two standalone ba...
research
03/01/2022

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

Manual annotation of large-scale point cloud dataset for varying tasks s...
research
11/22/2022

PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point Cloud Understanding

Some self-supervised cross-modal learning approaches have recently demon...
research
03/14/2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

Masked Autoencoders learn strong visual representations and achieve stat...
research
04/28/2023

SGAligner : 3D Scene Alignment with Scene Graphs

Building 3D scene graphs has recently emerged as a topic in scene repres...
research
04/10/2021

Learning from 2D: Pixel-to-Point Knowledge Transfer for 3D Pretraining

Most of the 3D networks are trained from scratch owning to the lack of l...

Please sign up or login with your details

Forgot password? Click here to reset