Towards Automatic Face-to-Face Translation

03/01/2020
by   Prajwal K R, et al.
18

In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as "Face-to-Face Translation". As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact on multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards "Face-to-Face Translation" by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGAN

READ FULL TEXT

page 1

page 7

research
08/22/2023

SeamlessM4T-Massively Multilingual Multimodal Machine Translation

What does it take to create the Babel Fish, a tool that can help individ...
research
11/06/2020

Large-scale multilingual audio visual dubbing

We describe a system for large-scale audiovisual translation and dubbing...
research
05/25/2022

Open-Domain Sign Language Translation Learned from Online Video

Existing work on sign language translation–that is, translation from sig...
research
05/26/2023

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

We present BIG-C (Bemba Image Grounded Conversations), a large multimoda...
research
06/09/2022

Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos

In this paper, we propose a neural end-to-end system for voice preservin...
research
12/20/2022

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Multimodal machine translation (MMT) aims to improve translation quality...
research
04/06/2022

Prosodic Alignment for off-screen automatic dubbing

The goal of automatic dubbing is to perform speech-to-speech translation...

Please sign up or login with your details

Forgot password? Click here to reset