Learning semantic sentence representations from visually grounded language without lexical knowledge

03/27/2019
by   Danny Merkx, et al.
0

Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state-of-the-art on two popular image-caption retrieval benchmark data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations

We examine the effects of contrastive visual semantic pretraining by com...
research
06/16/2021

Semantic sentence similarity: size does not always matter

This study addresses the question whether visually grounded speech recog...
research
02/29/2020

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

In this paper, we present an approach, namely Lexical Semantic Image Com...
research
02/07/2020

Incorporating Visual Semantics into Sentence Representations within a Grounded Space

Language grounding is an active field aiming at enriching textual repres...
research
05/20/2019

Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes

Usage similarity estimation addresses the semantic proximity of word ins...
research
09/07/2018

Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

Distributional models provide a convenient way to model semantics using ...
research
02/12/2018

Evaluating Compositionality in Sentence Embeddings

An important frontier in the quest for human-like AI is compositional se...

Please sign up or login with your details

Forgot password? Click here to reset