Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

06/29/2023
by   Jarod Duret, et al.
0

We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units. Our approach relies on the use of multilingual emotion embedding that can capture affective information in a language-independent manner. We show that this embedding can be used to predict the pitch and duration of speech units in a target language, allowing us to resynthesize the source speech signal with the same emotional content. We evaluate our approach to English and French speech signals and show that it outperforms a baseline method that does not use emotional information, including when the emotion embedding is extracted from a different language. Even if this preliminary study does not address directly the machine translation issue, our results demonstrate the effectiveness of our approach for cross-lingual emotion preservation in the context of speech resynthesis.

READ FULL TEXT
research
07/17/2023

Multilingual Speech-to-Speech Translation into Multiple Target Languages

Speech-to-speech translation (S2ST) enables spoken communication between...
research
09/02/2023

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech – A Study between English and Mandarin

While the performance of cross-lingual TTS based on monolingual corpora ...
research
08/16/2023

AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis

Affect is an emotional characteristic encompassing valence, arousal, and...
research
06/30/2022

Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems

This paper proposes an effective emotional text-to-speech (TTS) system w...
research
05/17/2022

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

We propose the SAMU-XLSR: Semantically-Aligned Multimodal Utterance-leve...
research
06/14/2020

UWSpeech: Speech to Speech Translation for Unwritten Languages

Existing speech to speech translation systems heavily rely on the text o...
research
10/19/2021

Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

Learning emotion embedding from reference audio is a straightforward app...

Please sign up or login with your details

Forgot password? Click here to reset