Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

02/22/2021
by   Adam Dahlgren Lindström, et al.
1

Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion of probing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 12 visual-semantic embeddings compared to the corresponding unimodal embeddings, which suggest that the text and image dimensions represented in the former do complement each other.

READ FULL TEXT

page 2

page 6

page 14

page 15

research
04/22/2022

MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Learning semantically meaningful sentence embeddings is an open problem ...
research
06/04/2023

Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations

Multimodal embeddings aim to enrich the semantic information in neural r...
research
03/25/2017

Learning to Predict: A Fast Re-constructive Method to Generate Multimodal Embeddings

Integrating visual and linguistic information into a single multimodal r...
research
08/15/2023

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction

Automatic assessment of the quality of scholarly documents is a difficul...
research
06/03/2017

Order embeddings and character-level convolutions for multimodal alignment

With the novel and fast advances in the area of deep neural networks, se...
research
01/29/2023

Global Flood Prediction: a Multimodal Machine Learning Approach

Flooding is one of the most destructive and costly natural disasters, an...
research
05/23/2017

Better Text Understanding Through Image-To-Text Transfer

Generic text embeddings are successfully used in a variety of tasks. How...

Please sign up or login with your details

Forgot password? Click here to reset