Unsupervised Cross-lingual Image Captioning

10/03/2020
by   Jiahui Gao, et al.
0

Most recent image captioning works are conducted in English as the majority of image-caption datasets are in English. However, there are a large amount of non-native English speakers worldwide. Generating image captions in different languages is worth exploring. In this paper, we present a novel unsupervised method to generate image captions without using any caption corpus. Our method relies on 1) a cross-lingual auto-encoding, which learns the scene graph mapping function along with the scene graph encoders and sentence decoders on machine translation parallel corpora, and 2) an unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to sentence modality. By leveraging cross-lingual auto-encoding, cross-modal feature mapping, and adversarial learning, our method can learn an image captioner to generate captions in different languages. We verify the effectiveness of our proposed method on the Chinese image caption generation. The comparisons against several baseline methods demonstrate the effectiveness of our approach.

READ FULL TEXT
research
05/20/2023

Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment

Unpaired cross-lingual image captioning has long suffered from irrelevan...
research
08/15/2019

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards

Generating image descriptions in different languages is essential to sat...
research
04/16/2021

"Wikily" Neural Machine Translation Tailored to Cross-Lingual Tasks

We present a simple but effective approach for leveraging Wikipedia for ...
research
07/19/2023

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning

Cross-lingual image captioning is confronted with both cross-lingual and...
research
03/26/2019

Unpaired Image Captioning via Scene Graph Alignments

Deep neural networks have achieved great success on the image captioning...
research
05/01/2020

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

Cross-modal language generation tasks such as image captioning are direc...
research
09/11/2023

Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

Current research on cross-modal retrieval is mostly English-oriented, as...

Please sign up or login with your details

Forgot password? Click here to reset