Probing Representations Learned by Multimodal Recurrent and Transformer Models

08/29/2019
by   Jindřich Libovický, et al.
0

Recent literature shows that large-scale language modeling provides excellent reusable sentence representations with both recurrent and self-attentive architectures. However, there has been less clarity on the commonalities and differences in the representational properties induced by the two architectures. It also has been shown that visual information serves as one of the means for grounding sentence representations. In this paper, we present a meta-study assessing the representational quality of models where the training signal is obtained from different modalities, in particular, language modeling, image features prediction, and both textual and multimodal machine translation. We evaluate textual and visual features of sentence representations obtained using predominant approaches on image retrieval and semantic textual similarity. Our experiments reveal that on moderate-sized datasets, a sentence counterpart in a target language or visual modality provides much stronger training signal for sentence representation than language modeling. Importantly, we observe that while the Transformer models achieve superior machine translation quality, representations from the recurrent neural network based models perform significantly better over tasks focused on semantic relevance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2018

CUNI System for the WMT18 Multimodal Translation Task

We present our submission to the WMT18 Multimodal Translation Task. The ...
research
03/07/2019

Neural Language Modeling with Visual Features

Multimodal language models attempt to incorporate non-linguistic feature...
research
05/18/2018

Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations

Sentence representations can capture a wide range of information that ca...
research
09/22/2019

Inducing Constituency Trees through Neural Machine Translation

Latent tree learning(LTL) methods learn to parse sentences using only in...
research
04/07/2020

Towards Multimodal Simultaneous Neural Machine Translation

Simultaneous translation involves translating a sentence before the spea...
research
05/31/2022

VALHALLA: Visual Hallucination for Machine Translation

Designing better machine translation systems by considering auxiliary in...
research
02/07/2020

Incorporating Visual Semantics into Sentence Representations within a Grounded Space

Language grounding is an active field aiming at enriching textual repres...

Please sign up or login with your details

Forgot password? Click here to reset