Predicting Actions to Help Predict Translations

08/05/2019
by   Zixiu Wu, et al.
0

We address the task of text translation on the How2 dataset using a state of the art transformer-based multimodal approach. The question we ask ourselves is whether visual features can support the translation process, in particular, given that this is a dataset extracted from videos, we focus on the translation of actions, which we believe are poorly captured in current static image-text datasets currently used for multimodal translation. For that purpose, we extract different types of action features from the videos and carefully investigate how helpful this visual information is by testing whether it can increase translation quality when used in conjunction with (i) the original text and (ii) the original text where action-related words (or all verbs) are masked out. The latter is a simulation that helps us assess the utility of the image in cases where the text does not provide enough context about the action, or in the presence of noise in the input text.

READ FULL TEXT

page 3

page 4

research
08/02/2022

Silo NLP's Participation at WAT2022

This paper provides the system description of "Silo NLP's" submission to...
research
03/20/2019

Probing the Need for Visual Context in Multimodal Machine Translation

Current work on multimodal machine translation (MMT) has suggested that ...
research
12/20/2022

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Multimodal machine translation (MMT) aims to improve translation quality...
research
08/01/2022

Exploring the GLIDE model for Human Action-effect Prediction

We address the following action-effect prediction task. Given an image d...
research
06/01/2021

ViTA: Visual-Linguistic Translation by Aligning Object Tags

Multimodal Machine Translation (MMT) enriches the source text with visua...
research
12/09/2017

Modulating and attending the source image during encoding improves Multimodal Translation

We propose a new and fully end-to-end approach for multimodal translatio...
research
12/03/2018

Towards Visual Feature Translation

Most existing visual search systems are deployed based upon fixed kinds ...

Please sign up or login with your details

Forgot password? Click here to reset