T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation

05/24/2022
by   Paul-Ambroise Duquenne, et al.
0

We present a new approach to perform zero-shot cross-modal transfer between speech and text for translation tasks. Multilingual speech and text are encoded in a joint fixed-size representation space. Then, we compare different approaches to decode these multimodal and multilingual fixed-size representations, enabling zero-shot translation between languages and modalities. All our models are trained without the need of cross-modal labeled translation data. Despite a fixed-size representation, we achieve very competitive results on several text and speech translation tasks. In particular, we significantly improve the state-of-the-art for zero-shot speech translation on Must-C. Incorporating a speech decoder in our framework, we introduce the first results for zero-shot direct speech-to-speech and text-to-speech translation.

READ FULL TEXT
research
08/22/2023

SONAR: Sentence-Level Multimodal and Language-Agnostic Representations

We introduce SONAR, a new multilingual and multimodal fixed-size sentenc...
research
08/28/2023

An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation

Consistency regularization methods, such as R-Drop (Liang et al., 2021) ...
research
05/22/2023

Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training

End-to-end (E2E) spoken language understanding (SLU) is constrained by t...
research
11/04/2022

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

In this paper, we introduce our work of building a Streaming Multilingua...
research
09/10/2021

MURAL: Multimodal, Multitask Retrieval Across Languages

Both image-caption pairs and translation pairs provide the means to lear...
research
06/07/2022

Intra-agent speech permits zero-shot task acquisition

Human language learners are exposed to a trickle of informative, context...
research
01/26/2022

Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

Recently, end-to-end speech translation (ST) has gained significant atte...

Please sign up or login with your details

Forgot password? Click here to reset