Explicit Intensity Control for Accented Text-to-speech

10/27/2022
by   Rui Liu, et al.
0

Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1). How to control the intensity of accent in the process of TTS is a very interesting research direction, and has attracted more and more attention. Recent work design a speaker-adversarial loss to disentangle the speaker and accent information, and then adjust the loss weight to control the accent intensity. However, such a control method lacks interpretability, and there is no direct correlation between the controlling factor and natural accent intensity. To this end, this paper propose a new intuitive and explicit accent intensity control scheme for accented TTS. Specifically, we first extract the posterior probability, called as “goodness of pronunciation (GoP)” from the L1 speech recognition model to quantify the phoneme accent intensity for accented speech, then design a FastSpeech2 based TTS model, named Ai-TTS, to take the accent intensity expression into account during speech generation. Experiments show that the our method outperforms the baseline model in terms of accent rendering and intensity control.

READ FULL TEXT
research
09/22/2022

Controllable Accented Text-to-Speech Synthesis

Accented text-to-speech (TTS) synthesis seeks to generate speech with an...
research
06/01/2023

EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

There has been significant progress in emotional Text-To-Speech (TTS) sy...
research
03/14/2023

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

Recent expressive text to speech (TTS) models focus on synthesizing emot...
research
07/04/2018

Investigating the role of L1 in automatic pronunciation evaluation of L2 speech

Automatic pronunciation evaluation plays an important role in pronunciat...
research
11/17/2022

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Although current neural text-to-speech (TTS) models are able to generate...
research
11/11/2022

Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning

With the rapid development of the speech synthesis system, recent text-t...
research
06/07/2023

Manga Rescreening with Interpretable Screentone Representation

The process of adapting or repurposing manga pages is a time-consuming t...

Please sign up or login with your details

Forgot password? Click here to reset