Augmentation and knowledge distillation (KD) are well-established techni...
Previously, Target Speaker Extraction (TSE) has yielded outstanding
perf...
Recently, deep learning-based beamforming algorithms have shown promisin...
Visual information can serve as an effective cue for target speaker
extr...
The currently most prominent algorithm to train keyword spotting (KWS) m...
Transformers have emerged as a prominent model framework for audio taggi...
In this paper, we investigate representation learning for low-resource
k...
Electroencephalography (EEG) plays a vital role in detecting how brain
r...
Existing weakly supervised sound event detection (WSSED) work has not
ex...
Keyword spotting (KWS) is a core human-machine-interaction front-end tas...
In most cases, bilingual TTS needs to handle three types of input script...
We study the usability of pre-trained weakly supervised audio tagging (A...
Within the audio research community and the industry, keyword spotting (...
Large-scale audio tagging datasets inevitably contain imperfect labels, ...
Keyword spotting (KWS) and speaker verification (SV) are two important t...
Learning emotion embedding from reference audio is a straightforward app...
Sequence expansion between encoder and decoder is a critical challenge i...
Keyword spotting (KWS) on mobile devices generally requires a small memo...
Keyword spotting (KWS) on mobile devices generally requires a small memo...
We propose a multi-channel speech enhancement approach with a novel two-...
In multi-speaker speech synthesis, data from a number of speakers usuall...
This paper introduces GigaSpeech, an evolving, multi-domain English spee...
This paper introduces a new open-source speech corpus named "speechocean...
The front-end module in multi-channel automatic speech recognition (ASR)...
This paper presents the "Ethiopian" system for the SLT 2021 Children Spe...
Smart audio devices are gated by an always-on lightweight keyword spotti...
Attention-based seq2seq text-to-speech systems, especially those use
sel...
Neural networks based vocoders have recently demonstrated the powerful
a...
In this paper, we propose an attention-based end-to-end model for
multi-...
In this paper, we propose a sequence-to-sequence model for keyword spott...
In this paper, we propose an attention-based end-to-end neural approach ...
Speaker adaptation aims to estimate a speaker specific acoustic model fr...
We investigate the use of generative adversarial networks (GANs) in spee...
Recently, there has been an increasing interest in end-to-end speech
rec...