CUNI System for the WMT18 Multimodal Translation Task

11/12/2018
by   Jindřich Helcl, et al.
0

We present our submission to the WMT18 Multimodal Translation Task. The main feature of our submission is applying a self-attentive network instead of a recurrent neural network. We evaluate two methods of incorporating the visual features in the model: first, we include the image representation as another input to the network; second, we train the model to predict the visual features and use it as an auxiliary objective. For our submission, we acquired both textual and multimodal additional data. Both of the proposed methods yield significant improvements over recurrent networks and self-attentive textual baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2019

Probing Representations Learned by Multimodal Recurrent and Transformer Models

Recent literature shows that large-scale language modeling provides exce...
research
09/27/2016

House price estimation from visual and textual features

Most existing automatic house price estimation systems rely only on some...
research
07/14/2017

LIUM-CVC Submissions for WMT17 Multimodal Translation Task

This paper describes the monomodal and multimodal Neural Machine Transla...
research
03/23/2022

Affective Feedback Synthesis Towards Multimodal Text and Image Data

In this paper, we have defined a novel task of affective feedback synthe...
research
02/15/2019

Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

Detecting visual relationships, i.e. <Subject, Predicate, Object> triple...
research
07/26/2017

Video Highlight Prediction Using Audience Chat Reactions

Sports channel video portals offer an exciting domain for research on mu...
research
07/14/2017

CUNI System for the WMT17 Multimodal Translation Task

In this paper, we describe our submissions to the WMT17 Multimodal Trans...

Please sign up or login with your details

Forgot password? Click here to reset