UniKW-AT: Unified Keyword Spotting and Audio Tagging

09/23/2022
by   Heinrich Dinkel, et al.
0

Within the audio research community and the industry, keyword spotting (KWS) and audio tagging (AT) are seen as two distinct tasks and research fields. However, from a technical point of view, both of these tasks are identical: they predict a label (keyword in KWS, sound event in AT) for some fixed-sized input audio segment. This work proposes UniKW-AT: An initial approach for jointly training both KWS and AT. UniKW-AT enhances the noise-robustness for KWS, while also being able to predict specific sound events and enabling conditional wake-ups on sound events. Our approach extends the AT pipeline with additional labels describing the presence of a keyword. Experiments are conducted on the Google Speech Commands V1 (GSCV1) and the balanced Audioset (AS) datasets. The proposed MobileNetV2 model achieves an accuracy of 97.53 the GSCV1 dataset and an mAP of 33.4 on the AS evaluation set. Further, we show that significant noise-robustness gains can be observed on a real-world KWS dataset, greatly outperforming standard KWS approaches. Our study shows that KWS and AT can be merged into a single framework without significant performance degradation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2023

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Keyword spotting (KWS) is a core human-machine-interaction front-end tas...
research
04/06/2019

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

The Detection and Classification of Acoustic Scenes and Events (DCASE) 2...
research
02/03/2021

A Global-local Attention Framework for Weakly Labelled Audio Tagging

Weakly labelled audio tagging aims to predict the classes of sound event...
research
04/12/2018

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

Sound event detection (SED) aims to detect what and when sound events ha...
research
06/09/2021

Audiovisual transfer learning for audio tagging and sound event detection

We study the merit of transfer learning for two sound recognition proble...
research
07/11/2018

Efficient keyword spotting using time delay neural networks

This paper describes a novel method of live keyword spotting using a two...
research
02/16/2021

Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup

Recently, semi-supervised learning (SSL) methods, in the framework of de...

Please sign up or login with your details

Forgot password? Click here to reset