Dynamic Context-guided Capsule Network for Multimodal Machine Translation

09/04/2020
by   Huan Lin, et al.
0

Multimodal machine translation (MMT), which mainly focuses on enhancing text-only translation with visual features, has attracted considerable attention from both computer vision and natural language processing communities. Most current MMT models resort to attention mechanism, global context modeling or multimodal joint representation learning to utilize visual features. However, the attention mechanism lacks sufficient semantic interactions between modalities while the other two provide fixed visual context, which is unsuitable for modeling the observed variability when generating translation. To address the above issues, in this paper, we propose a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at each timestep of decoding, we first employ the conventional source-target attention to produce a timestep-specific source-side context vector. Next, DCCN takes this vector as input and uses it to guide the iterative extraction of related visual features via a context-guided dynamic routing mechanism. Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities. Finally, we obtain two multimodal context vectors, which are fused and incorporated into the decoder for the prediction of the target word. Experimental results on the Multi30K dataset of English-to-German and English-to-French translation demonstrate the superiority of DCCN. Our code is available on https://github.com/DeepLearnXMU/MM-DCCN.

READ FULL TEXT

page 3

page 8

research
07/04/2017

An empirical study on the effectiveness of images in Multimodal Neural Machine Translation

In state-of-the-art Neural Machine Translation (NMT), an attention mecha...
research
12/09/2017

Modulating and attending the source image during encoding improves Multimodal Translation

We propose a new and fully end-to-end approach for multimodal translatio...
research
03/23/2017

Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

In state-of-the-art Neural Machine Translation, an attention mechanism i...
research
03/02/2023

LANDMARK: Language-guided Representation Enhancement Framework for Scene Graph Generation

Scene graph generation (SGG) is a sophisticated task that suffers from b...
research
09/15/2020

Simultaneous Machine Translation with Visual Context

Simultaneous machine translation (SiMT) aims to translate a continuous i...
research
12/01/2021

Routing with Self-Attention for Multimodal Capsule Networks

The task of multimodal learning has seen a growing interest recently as ...
research
04/27/2017

A GRU-Gated Attention Model for Neural Machine Translation

Neural machine translation (NMT) heavily relies on an attention network ...

Please sign up or login with your details

Forgot password? Click here to reset