A Computational Acquisition Model for Multimodal Word Categorization

05/12/2022
by   Uri Berger, et al.
0

Recent advances in self-supervised modeling of text and images open new opportunities for computational models of child language acquisition, which is believed to rely heavily on cross-modal signals. However, prior studies have been limited by their reliance on vision models trained on large image datasets annotated with a pre-defined set of depicted object categories. This is (a) not faithful to the information children receive and (b) prohibits the evaluation of such models with respect to category learning tasks, due to the pre-imposed category structure. We address this gap, and present a cognitively-inspired, multimodal acquisition model, trained from image-caption pairs on naturalistic data using cross-modal self-supervision. We show that the model learns word categories and object recognition abilities, and presents trends reminiscent of those reported in the developmental literature. We make our code and trained models public for future reference and use.

READ FULL TEXT

page 6

page 8

research
07/14/2023

PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting

Vision-language (VL) Pre-training (VLP) has shown to well generalize VL ...
research
10/20/2022

A survey on Self Supervised learning approaches for improving Multimodal representation learning

Recently self supervised learning has seen explosive growth and use in v...
research
11/14/2019

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Humans do not acquire perceptual abilities in the way we train machines....
research
07/31/2022

Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Human infants learn the names of objects and develop their own conceptua...
research
05/17/2020

T-VSE: Transformer-Based Visual Semantic Embedding

Transformer models have recently achieved impressive performance on NLP ...
research
01/08/2022

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Cross-Modal Retrieval (CMR) is an important research topic across multim...
research
09/15/2021

Multiagent Multimodal Categorization for Symbol Emergence: Emergent Communication via Interpersonal Cross-modal Inference

This paper describes a computational model of multiagent multimodal cate...

Please sign up or login with your details

Forgot password? Click here to reset