Volumetric Transformer Networks

07/18/2020
by   Seungryong Kim, et al.
0

Existing techniques to encode spatial invariance within deep convolutional neural networks (CNNs) apply the same warping field to all the feature channels. This does not account for the fact that the individual feature channels can represent different semantic parts, which can undergo different spatial transformations w.r.t. a canonical configuration. To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. We design our VTN as an encoder-decoder network, with modules dedicated to letting the information flow across the feature channels, to account for the dependencies between the semantic parts. We further propose a loss function defined between the warped features of pairs of instances, which improves the localization ability of VTN. Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.

READ FULL TEXT

page 2

page 10

page 13

research
06/05/2015

Spatial Transformer Networks

Convolutional Neural Networks define an exceptionally powerful class of ...
research
03/12/2021

Sequential Random Network for Fine-grained Image Classification

Deep Convolutional Neural Network (DCNN) and Transformer have achieved r...
research
03/25/2020

Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

Existing techniques to encode spatial invariance within deep convolution...
research
04/13/2019

Transformable Bottleneck Networks

We propose a novel approach to performing fine-grained 3D manipulation o...
research
04/24/2020

Understanding when spatial transformer networks do not support invariance, and what to do about it

Spatial transformer networks (STNs) were designed to enable convolutiona...
research
05/23/2019

Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks

The Convolutional Neural Networks (CNNs) generate the feature representa...
research
02/16/2023

Frequency-domain Learning for Volumetric-based 3D Data Perception

Frequency-domain learning draws attention due to its superior tradeoff b...

Please sign up or login with your details

Forgot password? Click here to reset