Zhiyong Yan

research

∙ 08/23/2023

CED: Consistent ensemble distillation for audio tagging

Augmentation and knowledge distillation (KD) are well-established techni...

0 Heinrich Dinkel, et al. ∙

research

∙ 06/28/2023

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Previously, Target Speaker Extraction (TSE) has yielded outstanding perf...

0 Jiuxin Lin, et al. ∙

research

∙ 06/25/2023

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Visual information can serve as an effective cue for target speaker extr...

0 Jiuxin Lin, et al. ∙

research

∙ 05/30/2023

Understanding temporally weakly supervised training: A case study for keyword spotting

The currently most prominent algorithm to train keyword spotting (KWS) m...

0 Heinrich Dinkel, et al. ∙

research

∙ 05/29/2023

Streaming Audio Transformers for Online Audio Tagging

Transformers have emerged as a prominent model framework for audio taggi...

0 Heinrich Dinkel, et al. ∙

research

∙ 03/03/2023

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Keyword spotting (KWS) is a core human-machine-interaction front-end tas...

0 Heinrich Dinkel, et al. ∙

research

∙ 09/30/2022

An empirical study of weakly supervised audio tagging embeddings for general audio representations

We study the usability of pre-trained weakly supervised audio tagging (A...

0 Heinrich Dinkel, et al. ∙

research

∙ 09/23/2022

UniKW-AT: Unified Keyword Spotting and Audio Tagging

Within the audio research community and the industry, keyword spotting (...

0 Heinrich Dinkel, et al. ∙

research

∙ 04/28/2022

Pseudo strong labels for large scale weakly supervised audio tagging

Large-scale audio tagging datasets inevitably contain imperfect labels, ...

0 Heinrich Dinkel, et al. ∙

research

∙ 06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...

0 Guoguo Chen, et al. ∙

research

∙ 04/03/2021

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

This paper introduces a new open-source speech corpus named "speechocean...

0 Junbo Zhang, et al. ∙

research

∙ 11/09/2020

Data Augmentation For Children's Speech Recognition – The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge

This paper presents the "Ethiopian" system for the SLT 2021 Children Spe...

0 Guoguo Chen, et al. ∙

Zhiyong Yan

Featured Co-authors

Sign in with Google

Consider DeepAI Pro