Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards

08/15/2019
by   Yuqing Song, et al.
3

Generating image descriptions in different languages is essential to satisfy users worldwide. However, it is prohibitively expensive to collect large-scale paired image-caption dataset for every target language which is critical for training descent image captioning models. Previous works tackle the unpaired cross-lingual image captioning problem through a pivot language, which is with the help of paired image-caption data in the pivot language and pivot-to-target machine translation models. However, such language-pivoted approach suffers from inaccuracy brought by the pivot-to-target translation, including disfluency and visual irrelevancy errors. In this paper, we propose to generate cross-lingual image captions with self-supervised rewards in the reinforcement learning framework to alleviate these two types of errors. We employ self-supervision from mono-lingual corpus in the target language to provide fluency reward, and propose a multi-level visual semantic matching model to provide both sentence-level and concept-level visual relevancy rewards. We conduct extensive experiments for unpaired cross-lingual image captioning in both English and Chinese respectively on two widely used image caption corpora. The proposed approach achieves significant performance improvement over state-of-the-art methods.

READ FULL TEXT

page 1

page 8

research
08/15/2017

Fluency-Guided Cross-Lingual Image Captioning

Image captioning has so far been explored mostly in English, as most ava...
research
10/03/2020

Unsupervised Cross-lingual Image Captioning

Most recent image captioning works are conducted in English as the major...
research
04/16/2021

"Wikily" Neural Machine Translation Tailored to Cross-Lingual Tasks

We present a simple but effective approach for leveraging Wikipedia for ...
research
05/20/2023

Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment

Unpaired cross-lingual image captioning has long suffered from irrelevan...
research
07/19/2023

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning

Cross-lingual image captioning is confronted with both cross-lingual and...
research
05/22/2018

COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval

This paper contributes to cross-lingual image annotation and retrieval i...
research
07/14/2017

CUNI System for the WMT17 Multimodal Translation Task

In this paper, we describe our submissions to the WMT17 Multimodal Trans...

Please sign up or login with your details

Forgot password? Click here to reset