Multi-Teacher Knowledge Distillation For Text Image Machine Translation

05/09/2023
by   Cong Ma, et al.
0

Text image machine translation (TIMT) has been widely used in various real-world applications, which translates source language texts in images into another target language sentence. Existing methods on TIMT are mainly divided into two categories: the recognition-then-translation pipeline model and the end-to-end model. However, how to transfer knowledge from the pipeline model into the end-to-end model remains an unsolved problem. In this paper, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) method to effectively distillate knowledge into the end-to-end TIMT model from the pipeline model. Specifically, three teachers are utilized to improve the performance of the end-to-end TIMT model. The image encoder in the end-to-end TIMT model is optimized with the knowledge distillation guidance from the recognition teacher encoder, while the sequential encoder and decoder are improved by transferring knowledge from the translation sequential and decoder teacher models. Furthermore, both token and sentence-level knowledge distillations are incorporated to better boost the translation performance. Extensive experimental results show that our proposed MTKD effectively improves the text image translation performance and outperforms existing end-to-end and pipeline models with fewer parameters and less decoding time, illustrating that MTKD can take advantage of both pipeline and end-to-end models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2019

End-to-End Speech Translation with Knowledge Distillation

End-to-end speech translation (ST), which directly translates from sourc...
research
10/08/2022

Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

End-to-end text image translation (TIT), which aims at translating the s...
research
11/27/2022

EPIK: Eliminating multi-model Pipelines with Knowledge-distillation

Real-world tasks are largely composed of multiple models, each performin...
research
05/09/2023

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

Text image machine translation (TIMT) aims to translate texts embedded i...
research
03/29/2022

Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

We propose Nix-TTS, a lightweight neural TTS (Text-to-Speech) model achi...
research
10/10/2022

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

Past works on multimodal machine translation (MMT) elevate bilingual set...
research
10/07/2022

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

Due to the high performance of multi-channel speech processing, we can u...

Please sign up or login with your details

Forgot password? Click here to reset