Neural text-to-speech systems are often optimized on L1/L2 losses, which...
In expressive speech synthesis it is widely adopted to use latent prosod...
Despite significant advances in recent years, the existing Computer-Assi...
The research community has long studied computer-assisted pronunciation
...
Non-parallel voice conversion (VC) is typically achieved using lossy
rep...
Artificial speech synthesis has made a great leap in terms of naturalnes...
Whilst recent neural text-to-speech (TTS) approaches produce high-qualit...
This paper proposes a general enhancement to the Normalizing Flows (NF) ...
We propose a weakly-supervised model for word-level mispronunciation
det...
We present a universal neural vocoder based on Parallel WaveNet, with an...
A common approach to the automatic detection of mispronunciation in lang...
This paper describes two novel complementary techniques that improve the...
This paper proposed a novel approach for the detection and reconstructio...
Statistical TTS systems that directly predict the speech waveform have
r...