GMAIR: Unsupervised Object Detection Based on Spatial Attention and Gaussian Mixture

06/03/2021
by   Weijin Zhu, et al.
0

Recent studies on unsupervised object detection based on spatial attention have achieved promising results. Models, such as AIR and SPAIR, output "what" and "where" latent variables that represent the attributes and locations of objects in a scene, respectively. Most of the previous studies concentrate on the "where" localization performance; however, we claim that acquiring "what" object attributes is also essential for representation learning. This paper presents a framework, GMAIR, for unsupervised object detection. It incorporates spatial attention and a Gaussian mixture in a unified deep generative model. GMAIR can locate objects in a scene and simultaneously cluster them without supervision. Furthermore, we analyze the "what" latent variables and clustering process. Finally, we evaluate our model on MultiMNIST and Fruit2D datasets and show that GMAIR achieves competitive results on localization and clustering compared to state-of-the-art methods.

READ FULL TEXT

page 7

page 8

page 15

research
09/25/2019

Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders

In clustering we normally output one cluster variable for each datapoint...
research
01/08/2020

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

The ability to decompose complex multi-object scenes into meaningful abs...
research
12/10/2021

Guided Generative Models using Weak Supervision for Detecting Object Spatial Arrangement in Overhead Images

The increasing availability and accessibility of numerous overhead image...
research
08/29/2019

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

This paper presents an unsupervised method that trains neural source sep...
research
09/30/2020

Learning Object Detection from Captions via Textual Scene Attributes

Object detection is a fundamental task in computer vision, requiring lar...
research
06/05/2018

Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable de...
research
06/13/2022

Compositional Mixture Representations for Vision and Text

Learning a common representation space between vision and language allow...

Please sign up or login with your details

Forgot password? Click here to reset