Conformer-based end-to-end models have become ubiquitous these days and ...
Recently, there has been an increasing interest in unifying streaming an...
The availability of data in expressive styles across languages is limite...
State-of-the-art text-to-speech (TTS) systems require several hours of
r...
We address the problem of cross-speaker style transfer for text-to-speec...
Whilst recent neural text-to-speech (TTS) approaches produce high-qualit...
Emotional voice conversion models adapt the emotion in speech without
ch...
While recent neural text-to-speech (TTS) systems perform remarkably well...
We present an approach to synthesize whisper by applying a handcrafted s...
Pitch detection is a fundamental problem in speech processing as F0 is u...