Augmentation and knowledge distillation (KD) are well-established techni...
Previously, Target Speaker Extraction (TSE) has yielded outstanding
perf...
Visual information can serve as an effective cue for target speaker
extr...
The currently most prominent algorithm to train keyword spotting (KWS) m...
Transformers have emerged as a prominent model framework for audio taggi...
Keyword spotting (KWS) is a core human-machine-interaction front-end tas...
We study the usability of pre-trained weakly supervised audio tagging (A...
Within the audio research community and the industry, keyword spotting (...
Large-scale audio tagging datasets inevitably contain imperfect labels, ...
This paper introduces GigaSpeech, an evolving, multi-domain English spee...
This paper introduces a new open-source speech corpus named "speechocean...
This paper presents the "Ethiopian" system for the SLT 2021 Children Spe...