Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

10/14/2022
by   Yuta Matsunaga, et al.
0

We present a comprehensive empirical study for personalized spontaneous speech synthesis on the basis of linguistic knowledge. With the advent of voice cloning for reading-style speech synthesis, a new voice cloning paradigm for human-like and spontaneous speech synthesis is required. We, therefore, focus on personalized spontaneous speech synthesis that can clone both the individual's voice timbre and speech disfluency. Specifically, we deal with filled pauses, a major source of speech disfluency, which is known to play an important role in speech generation and communication in psychology and linguistics. To comparatively evaluate personalized filled pause insertion and non-personalized filled pause prediction methods, we developed a speech synthesis method with a non-personalized external filled pause predictor trained with a multi-speaker corpus. The results clarify the position-word entanglement of filled pauses, i.e., the necessity of precisely predicting positions for naturalness and the necessity of precisely predicting words for individuality on the evaluation of synthesized speech.

READ FULL TEXT
research
10/18/2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

We propose a training method for spontaneous speech synthesis models tha...
research
01/26/2022

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

In this paper, we construct a Japanese audiobook speech corpus called "J...
research
10/07/2021

Cloning one's voice using very limited data in the wild

With the increasing popularity of speech synthesis products, the industr...
research
04/01/2022

Residual-guided Personalized Speech Synthesis based on Face Image

Previous works derive personalized speech features by training the model...
research
06/15/2021

STAN: A stuttering therapy analysis helper

Stuttering is a complex speech disorder identified by repeti-tions, prol...
research
04/19/2023

Personalized State Anxiety Detection: An Empirical Study with Linguistic Biomarkers and A Machine Learning Pipeline

Individuals high in social anxiety symptoms often exhibit elevated state...
research
09/22/2017

Techniques and Challenges in Speech Synthesis

The aim of this project was to develop and implement an English language...

Please sign up or login with your details

Forgot password? Click here to reset