The awareness for biased ASR datasets or models has increased notably in...
Most people who have tried to learn a foreign language would have experi...
Text-based voice editing (TBVE) uses synthetic output from text-to-speec...
Typical high quality text-to-speech (TTS) systems today use a two-stage
...
Fine-grained image classification is a challenging problem, since the
di...