Duplex Diffusion Models Improve Speech-to-Speech Translation

05/22/2023
by   Xianchao Wu, et al.
0

Speech-to-speech translation is a typical sequence-to-sequence learning task that naturally has two directions. How to effectively leverage bidirectional supervision signals to produce high-fidelity audio for both directions? Existing approaches either train two separate models or a multitask-learned model with low efficiency and inferior performance. In this paper, we propose a duplex diffusion model that applies diffusion probabilistic models to both sides of a reversible duplex Conformer, so that either end can simultaneously input and output a distinct language's speech. Our model enables reversible speech translation by simply flipping the input and output ends. Experiments show that our model achieves the first success of reversible speech translation with significant improvements of ASR-BLEU scores compared with a list of state-of-the-art baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2021

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Sequence-to-sequence (seq2seq) problems such as machine translation are ...
research
07/12/2021

Direct speech-to-speech translation with discrete units

We present a direct speech-to-speech translation (S2ST) model that trans...
research
03/20/2022

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

How to learn a better speech representation for end-to-end speech-to-tex...
research
12/19/2022

Mu^2SLAM: Multitask, Multilingual Speech and Language Models

We present Mu^2SLAM, a multilingual sequence-to-sequence model pre-train...
research
04/14/2021

Large-Scale Self- and Semi-Supervised Learning for Speech Translation

In this paper, we improve speech translation (ST) through effectively le...
research
07/31/2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Neural text-to-speech systems are often optimized on L1/L2 losses, which...
research
11/06/2022

Deliberation Networks and How to Train Them

Deliberation networks are a family of sequence-to-sequence models, which...

Please sign up or login with your details

Forgot password? Click here to reset