Improved Cross-view Completion Pre-training for Stereo Matching

11/18/2022
by   Philippe Weinzaepfel, et al.
0

Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching. The application of self-supervised learning concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work we build on the recent cross-view completion framework: this variation of masked image modeling leverages a second view from the same scene, which is well suited for binocular downstream tasks. However, the applicability of this concept has so far been limited in at least two ways: (a) by the difficulty of collecting real-world image pairs - in practice only synthetic data had been used - and (b) by the lack of generalization of vanilla transformers to dense downstream tasks for which relative position is more meaningful than absolute position. We explore three avenues of improvement: first, we introduce a method to collect suitable real-world image pairs at large scale. Second, we experiment with relative positional embeddings and demonstrate that they enable vision transformers to perform substantially better. Third, we scale up vision transformer based cross-completion architectures, which is made possible by the use of large amounts of data. With these improvements, we show for the first time that state-of-the-art results on deep stereo matching can be reached without using any standard task-specific techniques like correlation volume, iterative estimation or multi-scale reasoning.

READ FULL TEXT

page 2

page 3

page 7

research
10/19/2022

CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

Masked Image Modeling (MIM) has recently been established as a potent pr...
research
01/18/2022

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

Recently, self-supervised vision transformers have attracted unprecedent...
research
12/09/2022

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Computational pathology can lead to saving human lives, but models are a...
research
05/27/2022

Architecture-Agnostic Masked Image Modeling – From ViT back to CNN

Masked image modeling (MIM), an emerging self-supervised pre-training me...
research
10/05/2021

Exploring the Limits of Large Scale Pre-training

Recent developments in large-scale machine learning suggest that by scal...
research
02/02/2022

Relative Position Prediction as Pre-training for Text Encoders

Meaning is defined by the company it keeps. However, company is two-fold...
research
07/25/2022

Deep Laparoscopic Stereo Matching with Transformers

The self-attention mechanism, successfully employed with the transformer...

Please sign up or login with your details

Forgot password? Click here to reset