Weakly-Supervised Spatial Context Networks

04/10/2017
by   Zuxuan Wu, et al.
0

We explore the power of spatial context as a self-supervisory signal for learning visual representations. In particular, we propose spatial context networks that learn to predict a representation of one image patch from another image patch, within the same image, conditioned on their real-valued relative spatial offset. Unlike auto-encoders, that aim to encode and reconstruct original image patches, our network aims to encode and reconstruct intermediate representations of the spatially offset patches. As such, the network learns a spatially conditioned contextual representation. By testing performance with various patch selection mechanisms we show that focusing on object-centric patches is important, and that using object proposal as a patch selection mechanism leads to the highest improvement in performance. Further, unlike auto-encoders, context encoders [21], or other forms of unsupervised feature learning, we illustrate that contextual supervision (with pre-trained model initialization) can improve on existing pre-trained model performance. We build our spatial context networks on top of standard VGG_19 and CNN_M architectures and, among other things, show that we can achieve improvements (with no additional explicit supervision) over the original ImageNet pre-trained VGG_19 and CNN_M models in object categorization and detection on VOC2007.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 8

research
04/25/2016

Context Encoders: Feature Learning by Inpainting

We present an unsupervised visual feature learning algorithm driven by c...
research
08/19/2022

Accelerating Vision Transformer Training via a Patch Sampling Schedule

We introduce the notion of a Patch Sampling Schedule (PSS), that varies ...
research
03/23/2023

Detecting Backdoors in Pre-trained Encoders

Self-supervised learning in computer vision trains on unlabeled data, su...
research
06/28/2018

Unsupervised Natural Image Patch Learning

Learning a metric of natural image patches is an important tool for anal...
research
12/13/2022

OAMixer: Object-aware Mixing Layer for Vision Transformers

Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have sh...
research
11/26/2018

MIST: Multiple Instance Spatial Transformer Network

We propose a deep network that can be trained to tackle image reconstruc...
research
12/04/2020

Is It a Plausible Colour? UCapsNet for Image Colourisation

Human beings can imagine the colours of a grayscale image with no partic...

Please sign up or login with your details

Forgot password? Click here to reset