Previous works on emotion recognition in conversation (ERC) follow a two...
In most cases, bilingual TTS needs to handle three types of input script...
Learning emotion embedding from reference audio is a straightforward app...
Sequence expansion between encoder and decoder is a critical challenge i...
In multi-speaker speech synthesis, data from a number of speakers usuall...
High-fidelity singing voices usually require higher sampling rate (e.g.,...
Detecting singing-voice in polyphonic instrumental music is critical to ...
Current end-to-end autoregressive TTS systems (e.g. Tacotron 2) have
out...
In this paper, we develop DeepSinger, a multi-lingual multi-singer singi...
This paper presents a high quality singing synthesizer that is able to m...
This paper presents XiaoiceSing, a high-quality singing voice synthesis
...