Variational Auto-Encoder based Mandarin Speech Cloning

03/06/2022
by   Qingyu Xing, et al.
0

Speech cloning technology is becoming more sophisticated thanks to the advances in machine learning. Researchers have successfully implemented natural-sounding English speech synthesis and good English speech cloning by some effective models. However, because of prosodic phrasing and large character set of Mandarin, Chinese utilization of these models is not yet complete. By creating a new dataset and replacing Tacotron synthesizer with VAENAR-TTS, we improved the existing speech cloning technique CV2TTS to almost real-time speech cloning while guaranteeing synthesis quality. In the process, we customized the subjective tests of synthesis quality assessment by attaching various scenarios, so that subjects focus on the differences between voice and our improvements maybe were more advantageous to practical applications. The results of the A/B test, real-time factor (RTF) and 2.74 mean opinion score (MOS) in terms of naturalness and similarity, reflect the real-time high-quality Mandarin speech cloning we achieved.

READ FULL TEXT
research
04/02/2021

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

The objective speech quality assessment is usually conducted by comparin...
research
10/28/2018

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Neural speech synthesis models have recently demonstrated the ability to...
research
09/17/2021

On-device neural speech synthesis

Recent advances in text-to-speech (TTS) synthesis, such as Tacotron and ...
research
05/02/2019

High quality, lightweight and adaptable TTS using LPCNet

We present a lightweight adaptable neural TTS system with high quality o...
research
07/06/2021

Location, Location: Enhancing the Evaluation of Text-to-Speech Synthesis Using the Rapid Prosody Transcription Paradigm

Text-to-Speech synthesis systems are generally evaluated using Mean Opin...
research
08/21/2022

Visualising Model Training via Vowel Space for Text-To-Speech Systems

With the recent developments in speech synthesis via machine learning, t...
research
01/19/2018

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

Time- and pitch-scale modifications of speech signals find important app...

Please sign up or login with your details

Forgot password? Click here to reset