MVT: Multi-view Vision Transformer for 3D Object Recognition

10/25/2021
by   Shuo Chen, et al.
0

Inspired by the great success achieved by CNN in image recognition, view-based methods applied CNNs to model the projected views for 3D object understanding and achieved excellent performance. Nevertheless, multi-view CNN models cannot model the communications between patches from different views, limiting its effectiveness in 3D object recognition. Inspired by the recent success gained by vision Transformer in image recognition, we propose a Multi-view Vision Transformer (MVT) for 3D object recognition. Since each patch feature in a Transformer block has a global reception field, it naturally achieves communications between patches from different views. Meanwhile, it takes much less inductive bias compared with its CNN counterparts. Considering both effectiveness and efficiency, we develop a global-local structure for our MVT. Our experiments on two public benchmarks, ModelNet40 and ModelNet10, demonstrate the competitive performance of our MVT.

READ FULL TEXT
research
11/20/2022

R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

Recently, vision architectures based exclusively on multi-layer perceptr...
research
11/26/2014

3D-Assisted Image Feature Synthesis for Novel Views of an Object

Comparing two images in a view-invariant way has been a challenging prob...
research
12/02/2007

View Based Methods can achieve Bayes-Optimal 3D Recognition

This paper proves that visual object recognition systems using only 2D E...
research
02/08/2020

Variable-Viewpoint Representations for 3D Object Recognition

For the problem of 3D object recognition, researchers using deep learnin...
research
10/27/2017

Enhanced Biologically Inspired Model for Image Recognition Based on a Novel Patch Selection Method with Moment

Biologically inspired model (BIM) for image recognition is a robust comp...
research
03/29/2023

Self-accumulative Vision Transformer for Bone Age Assessment Using the Sauvegrain Method

This study presents a novel approach to bone age assessment (BAA) using ...
research
10/14/2016

Recurrent 3D Attentional Networks for End-to-End Active Object Recognition in Cluttered Scenes

Active vision is inherently attention-driven: The agent selects views of...

Please sign up or login with your details

Forgot password? Click here to reset