Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

07/31/2022
by   Taehyeong Kim, et al.
0

Human infants learn the names of objects and develop their own conceptual systems without explicit supervision. In this study, we propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms. The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks. Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks. It allows entities of different modalities with conceptually the same meaning to have similar semantic representation vectors. We quantitatively and qualitatively evaluate our method, including object-to-word mapping and zero-shot learning tasks, showing that the proposed model significantly outperforms the baselines and that each conceptual system is topologically aligned.

READ FULL TEXT
research
12/24/2021

Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification

Learning a common latent embedding by aligning the latent spaces of cros...
research
01/16/2023

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

The ability to quickly learn a new task with minimal instruction - known...
research
06/10/2021

Cross-Modal Discrete Representation Learning

Recent advances in representation learning have demonstrated an ability ...
research
10/18/2022

Probing Cross-modal Semantics Alignment Capability from the Textual Perspective

In recent years, vision and language pre-training (VLP) models have adva...
research
01/21/2021

Learning rich touch representations through cross-modal self-supervision

The sense of touch is fundamental in several manipulation tasks, but rar...
research
05/12/2022

A Computational Acquisition Model for Multimodal Word Categorization

Recent advances in self-supervised modeling of text and images open new ...
research
06/13/2022

Compositional Mixture Representations for Vision and Text

Learning a common representation space between vision and language allow...

Please sign up or login with your details

Forgot password? Click here to reset