Machine learning approaches to modelling analog audio effects have seen
...
Appropriate prosody is critical for successful spoken communication.
Con...
Some recent models for Text-to-Speech synthesis aim to transfer the pros...
Most state-of-the-art Text-to-Speech systems use the mel-spectrogram as ...
Text does not fully specify the spoken form, so text-to-speech models mu...
Many speech synthesis datasets, especially those derived from audiobooks...
Speaker identity is one of the important characteristics of human speech...
In English, prosody adds a broad range of information to segment sequenc...
We aim to characterize how different speakers contribute to the perceive...
Unlike human speakers, typical text-to-speech (TTS) systems are unable t...
An attacker may use a variety of techniques to fool an automatic speaker...
Output from statistical parametric speech synthesis (SPSS) remains notic...
This paper evaluates the robustness of a DNN-HMM-based speech recognitio...
This paper proposes a new approach to duration modelling for statistical...
Text-to-Speech synthesis in Indian languages has a seen lot of progress ...
We propose two novel techniques --- stacking bottleneck features and min...
Recently, recurrent neural networks (RNNs) as powerful sequence models h...