Efficient Multi-Modal Embeddings from Structured Data

10/06/2021
by   Anita L. Vero, et al.
0

Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual grounding can contribute to linguistic applications as well. Another motivation for this paper is the growing need for more interpretable models and for evaluating model efficiency regarding size and performance. This work explores the impact of visual information for semantics when the evaluation involves no direct visual input, specifically semantic similarity and relatedness. We investigate a new embedding type in-between linguistic and visual modalities, based on the structured annotations of Visual Genome. We compare uni- and multi-modal models including structured, linguistic and image based representations. We measure the efficiency of each model with regard to data and model size, modality / data distribution and information gain. The analysis includes an interpretation of embedding structures. We found that this new embedding conveys complementary information for text based embeddings. It achieves comparable performance in an economic way, using orders of magnitude less resources than visual models.

READ FULL TEXT
research
12/09/2020

Hateful Memes Detection via Complementary Visual and Linguistic Networks

Hateful memes are widespread in social media and convey negative informa...
research
08/19/2023

Interpretation on Multi-modal Visual Fusion

In this paper, we present an analytical framework and a novel metric to ...
research
04/23/2023

Modality-Aware Negative Sampling for Multi-modal Knowledge Graph Embedding

Negative sampling (NS) is widely used in knowledge graph embedding (KGE)...
research
09/07/2018

Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

Distributional models provide a convenient way to model semantics using ...
research
11/13/2021

Memotion Analysis through the Lens of Joint Embedding

Joint embedding (JE) is a way to encode multi-modal data into a vector s...
research
08/27/2019

Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

This work aims at modeling how the meaning of gradable adjectives of siz...
research
04/28/2020

The Immersion of Directed Multi-graphs in Embedding Fields. Generalisations

The purpose of this paper is to outline a generalised model for represen...

Please sign up or login with your details

Forgot password? Click here to reset