Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

10/10/2022
by   Ru Peng, et al.
0

Past works on multimodal machine translation (MMT) elevate bilingual setup by incorporating additional aligned vision information. However, an image-must requirement of the multimodal dataset largely hinders MMT's development – namely that it demands an aligned form of [image, source text, target text]. This limitation is generally troublesome during the inference phase especially when the aligned image is not provided as in the normal NMT setup. Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme. In particular, a multimodal feature generator is executed with a knowledge distillation module, which directly generates the multimodal feature from (only) source texts as the input. While there have been a few prior works entertaining the possibility to support image-free inference for machine translation, their performances have yet to rival the image-must translation. In our experiments, we identify our method as the first image-free approach to comprehensively rival or even surpass (almost) all image-must frameworks, and achieved the state-of-the-art result on the often-used Multi30k benchmark. Our code and data are available at: https://github.com/pengr/IKD-mmt/tree/master..

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2023

Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

In this work, we investigate a more realistic unsupervised multimodal ma...
research
08/29/2023

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

There has been a growing interest in developing multimodal machine trans...
research
02/26/2022

Content-Variant Reference Image Quality Assessment via Knowledge Distillation

Generally, humans are more skilled at perceiving differences between hig...
research
05/09/2023

Multi-Teacher Knowledge Distillation For Text Image Machine Translation

Text image machine translation (TIMT) has been widely used in various re...
research
07/17/2023

Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts

End-to-end automatic speech translation (AST) relies on data that combin...
research
07/14/2023

Multimodal Distillation for Egocentric Action Recognition

The focal point of egocentric video understanding is modelling hand-obje...
research
05/22/2023

D^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization

Many-to-many multimodal summarization (M^3S) task aims to generate summa...

Please sign up or login with your details

Forgot password? Click here to reset