Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

07/25/2022
by   Raul Fernandez, et al.
0

Sequence-to-Sequence Text-to-Speech architectures that directly generate low level acoustic features from phonetic sequences are known to produce natural and expressive speech when provided with adequate amounts of training data. Such systems can learn and transfer desired speaking styles from one seen speaker to another (in multi-style multi-speaker settings), which is highly desirable for creating scalable and customizable Human-Computer Interaction systems. In this work we explore one-to-many style transfer from a dedicated single-speaker conversational corpus with style nuances and interjections. We elaborate on the corpus design and explore the feasibility of such style transfer when assisted with Voice-Conversion-based data augmentation. In a set of subjective listening experiments, this approach resulted in high-fidelity style transfer with no quality degradation. However, a certain voice persona shift was observed, requiring further improvements in voice conversion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

End-to-end lyrics Recognition with Voice to Singing Style Transfer

Automatic transcription of monophonic/polyphonic music is a challenging ...
research
02/10/2022

Cross-speaker style transfer for text-to-speech using data augmentation

We address the problem of cross-speaker style transfer for text-to-speec...
research
04/19/2023

Affective social anthropomorphic intelligent system

Human conversational styles are measured by the sense of humor, personal...
research
08/26/2022

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

In this paper, we propose a model to perform style transfer of speech to...
research
08/13/2020

Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

The increased adoption of digital assistants makes text-to-speech (TTS) ...
research
11/14/2021

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

The task of few-shot style transfer for voice cloning in text-to-speech ...
research
01/06/2020

Mel-spectrogram augmentation for sequence to sequence voice conversion

When training the sequence-to-sequence voice conversion model, we need t...

Please sign up or login with your details

Forgot password? Click here to reset