Augmentation and knowledge distillation (KD) are well-established techni...
Graph domain adaptation models are widely adopted in cross-network learn...
Previously, Target Speaker Extraction (TSE) has yielded outstanding
perf...
Visual information can serve as an effective cue for target speaker
extr...
The currently most prominent algorithm to train keyword spotting (KWS) m...
Transformers have emerged as a prominent model framework for audio taggi...
Keyword spotting (KWS) is a core human-machine-interaction front-end tas...
We study the usability of pre-trained weakly supervised audio tagging (A...
Within the audio research community and the industry, keyword spotting (...
Large-scale audio tagging datasets inevitably contain imperfect labels, ...
This paper introduces GigaSpeech, an evolving, multi-domain English spee...
Nowadays, users are encouraged to activate across multiple online social...
This paper introduces a new open-source speech corpus named "speechocean...
Nowadays online users prefer to join multiple social media for the purpo...
This paper presents the "Ethiopian" system for the SLT 2021 Children Spe...
We are now witnessing the increasing availability of event stream data, ...