Unsupervised Cross-lingual Image Captioning

by   Jiahui Gao, et al.

Most recent image captioning works are conducted in English as the majority of image-caption datasets are in English. However, there are a large amount of non-native English speakers worldwide. Generating image captions in different languages is worth exploring. In this paper, we present a novel unsupervised method to generate image captions without using any caption corpus. Our method relies on 1) a cross-lingual auto-encoding, which learns the scene graph mapping function along with the scene graph encoders and sentence decoders on machine translation parallel corpora, and 2) an unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to sentence modality. By leveraging cross-lingual auto-encoding, cross-modal feature mapping, and adversarial learning, our method can learn an image captioner to generate captions in different languages. We verify the effectiveness of our proposed method on the Chinese image caption generation. The comparisons against several baseline methods demonstrate the effectiveness of our approach.


Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards

Generating image descriptions in different languages is essential to sat...

Fluency-Guided Cross-Lingual Image Captioning

Image captioning has so far been explored mostly in English, as most ava...

"Wikily" Neural Machine Translation Tailored to Cross-Lingual Tasks

We present a simple but effective approach for leveraging Wikipedia for ...

Unpaired Image Captioning via Scene Graph Alignments

Deep neural networks have achieved great success on the image captioning...

Unpaired Image Captioning by Language Pivoting

Image captioning is a multimodal task involving computer vision and natu...

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

Cross-modal language generation tasks such as image captioning are direc...

Cross-lingual Inference with A Chinese Entailment Graph

Predicate entailment detection is a crucial task for question-answering ...