Recurrent 3D Attentional Networks for End-to-End Active Object Recognition in Cluttered Scenes

10/14/2016
by   Min Liu, et al.
0

Active vision is inherently attention-driven: The agent selects views of observation to best approach the vision task while improving its internal representation of the scene being observed. Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we propose to address the multi-view depth-based active object recognition using attention mechanism, through developing an end-to-end recurrent 3D attentional network. The architecture comprises of a recurrent neural network (RNN), storing and updating an internal representation, and two levels of spatial transformer units, guiding two-level attentions. Our model, trained with a 3D shape database, is able to iteratively attend to the best views targeting an object of interest for recognizing it, and focus on the object in each view for removing the background clutter. To realize 3D view selection, we derive a 3D spatial transformer network which is differentiable for training with back-propagation, achieving must faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method outperforms state-of-the-art methods in cluttered scenes.

READ FULL TEXT

page 1

page 6

page 7

page 8

research
08/20/2018

VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification

Multi-view deep neural network is perhaps the most successful approach i...
research
06/12/2017

Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an...
research
10/25/2021

MVT: Multi-view Vision Transformer for 3D Object Recognition

Inspired by the great success achieved by CNN in image recognition, view...
research
06/04/2019

Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition

View based strategies for 3D object recognition have proven to be very s...
research
12/24/2014

Multiple Object Recognition with Visual Attention

We present an attention-based model for recognizing multiple objects in ...
research
10/18/2020

Unsupervised Foveal Vision Neural Networks with Top-Down Attention

Deep learning architectures are an extremely powerful tool for recognizi...
research
11/20/2022

R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

Recently, vision architectures based exclusively on multi-layer perceptr...

Please sign up or login with your details

Forgot password? Click here to reset