Adversarial reconstruction for Multi-modal Machine Translation

10/07/2019
by   Jean-Benoit Delbrouck, et al.
0

Even with the growing interest in problems at the intersection of Computer Vision and Natural Language, grounding (i.e. identifying) the components of a structured description in an image still remains a challenging task. This contribution aims to propose a model which learns grounding by reconstructing the visual features for the Multi-modal translation task. Previous works have partially investigated standard approaches such as regression methods to approximate the reconstruction of a visual input. In this paper, we propose a different and novel approach which learns grounding by adversarial feedback. To do so, we modulate our network following the recent promising adversarial architectures and evaluate how the adversarial response from a visual reconstruction as an auxiliary task helps the model in its learning. We report the highest scores in term of BLEU and METEOR metrics on the different datasets.

READ FULL TEXT
research
04/09/2022

On the Importance of Karaka Framework in Multi-modal Grounding

Computational Paninian Grammar model helps in decoding a natural languag...
research
04/05/2022

Multi-View Transformer for 3D Visual Grounding

The 3D visual grounding task aims to ground a natural language descripti...
research
02/04/2017

Doubly-Attentive Decoder for Multi-modal Neural Machine Translation

We introduce a Multi-modal Neural Machine Translation model in which a d...
research
08/11/2021

A Better Loss for Visual-Textual Grounding

Given a textual phrase and an image, the visual grounding problem is def...
research
10/22/2022

HAM: Hierarchical Attention Model with High Performance for 3D Visual Grounding

This paper tackles an emerging and challenging vision-language task, 3D ...
research
05/25/2023

Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

This paper addresses the problem of 3D referring expression comprehensio...
research
06/01/2017

Grounding Symbols in Multi-Modal Instructions

As robots begin to cohabit with humans in semi-structured environments, ...

Please sign up or login with your details

Forgot password? Click here to reset