Compositional Mixture Representations for Vision and Text

06/13/2022
by   Stephan Alaniz, et al.
24

Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation learning approach we learn to split images into separately encoded patches to associate visual and textual representations in an interpretable manner. On variations of MNIST and CIFAR10, our model is able to perform weakly supervised object detection and demonstrates its ability to extrapolate to unseen combination of objects.

READ FULL TEXT
research
02/15/2022

Compositional Scene Representation Learning via Reconstruction: A Survey

Visual scene representation learning is an important research problem in...
research
05/01/2020

Probing Text Models for Common Ground with Visual Representations

Vision, as a central component of human perception, plays a fundamental ...
research
10/09/2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Multimodal representation learning has shown promising improvements on v...
research
07/31/2022

Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Human infants learn the names of objects and develop their own conceptua...
research
06/11/2019

Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition

Learning from a few examples is a challenging task for machine learning....
research
06/03/2021

GMAIR: Unsupervised Object Detection Based on Spatial Attention and Gaussian Mixture

Recent studies on unsupervised object detection based on spatial attenti...
research
05/23/2023

Parts of Speech-Grounded Subspaces in Vision-Language Models

Latent image representations arising from vision-language models have pr...

Please sign up or login with your details

Forgot password? Click here to reset