Self-supervised Learning by View Synthesis

04/22/2023
by   Shaoteng Liu, et al.
4

We present view-synthesis autoencoders (VSA) in this paper, which is a self-supervised learning framework designed for vision transformers. Different from traditional 2D pretraining methods, VSA can be pre-trained with multi-view data. In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose. The decoder of VSA has several cross-attention blocks, which use the source view as value, source pose as key, and target pose as query. They achieve cross-attention to synthesize the target view. This simple approach realizes large-angle view synthesis and learns spatial invariant representation, where the latter is decent initialization for transformers on downstream tasks, such as 3D classification on ModelNet40, ShapeNet Core55, and ScanObjectNN. VSA outperforms existing methods significantly for linear probing and is competitive for fine-tuning. The code will be made publicly available.

READ FULL TEXT

page 1

page 2

page 4

page 7

page 13

research
02/02/2023

Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning

3D hand pose estimation has made significant progress in recent years. H...
research
07/30/2022

Improving Fine-tuning of Self-supervised Models with Contrastive Initialization

Self-supervised learning (SSL) has achieved remarkable performance in pr...
research
11/28/2022

A Light Touch Approach to Teaching Transformers Multi-view Geometry

Transformers are powerful visual learners, in large part due to their co...
research
11/26/2018

IGNOR: Image-guided Neural Object Rendering

We propose a new learning-based novel view synthesis approach for scanne...
research
01/19/2023

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

This paper demonstrates an approach for learning highly semantic image r...
research
07/29/2022

End-to-end View Synthesis via NeRF Attention

In this paper, we present a simple seq2seq formulation for view synthesi...
research
09/08/2023

Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning

Self-supervised learning (SSL) using mixed images has been studied to le...

Please sign up or login with your details

Forgot password? Click here to reset