Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot

11/14/2016
by   Hideki Nakayama, et al.
0

We propose an approach to build a neural machine translation system with no supervised resources (i.e., no parallel corpora) using multimodal embedded representation over texts and images. Based on the assumption that text documents are often likely to be described with other multimedia information (e.g., images) somewhat related to the content, we try to indirectly estimate the relevance between two languages. Using multimedia as the "pivot", we project all modalities into one common hidden space where samples belonging to similar semantic concepts should come close to each other, whatever the observed space of each sample is. This modality-agnostic representation is the key to bridging the gap between different modalities. Putting a decoder on top of it, our network can flexibly draw the outputs from any input modality. Notably, in the testing phase, we need only source language texts as the input for translation. In experiments, we tested our method on two benchmarks to show that it can achieve reasonable translation performance. We compared and investigated several possible implementations and found that an end-to-end model that simultaneously optimized both rank loss in multimodal encoders and cross-entropy loss in decoders performed the best.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2020

Generative Imagination Elevates Machine Translation

There are thousands of languages on earth, but visual perception is shar...
research
02/09/2018

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

While end-to-end neural machine translation (NMT) has achieved notable s...
research
11/28/2019

Multimodal Machine Translation through Visuals and Speech

Multimodal machine translation involves drawing information from more th...
research
08/22/2023

SONAR: Sentence-Level Multimodal and Language-Agnostic Representations

We introduce SONAR, a new multilingual and multimodal fixed-size sentenc...
research
03/23/2017

Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

In state-of-the-art Neural Machine Translation, an attention mechanism i...
research
05/07/2021

Learning Shared Semantic Space for Speech-to-Text Translation

Having numerous potential applications and great impact, end-to-end spee...
research
03/25/2020

End-to-End Entity Classification on Multimodal Knowledge Graphs

End-to-end multimodal learning on knowledge graphs has been left largely...

Please sign up or login with your details

Forgot password? Click here to reset