CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer

07/04/2022
by   Zhanheng Yang, et al.
0

Customized keyword spotting (KWS) has great potential to be deployed on edge devices to achieve hands-free user experience. However, in real applications, false alarm (FA) would be a serious problem for spotting dozens or even hundreds of keywords, which drastically affects user experience. To solve this problem, in this paper, we leverage the recent advances in transducer and transformer based acoustic models and propose a new multi-stage customized KWS framework named Cascaded Transducer-Transformer KWS (CaTT-KWS), which includes a transducer based keyword detector, a frame-level phone predictor based force alignment module and a transformer based decoder. Specifically, the streaming transducer module is used to spot keyword candidates in audio stream. Then force alignment is implemented using the phone posteriors predicted by the phone predictor to finish the first stage keyword verification and refine the time boundaries of keyword. Finally, the transformer decoder further verifies the triggered keyword. Our proposed CaTT-KWS framework reduces FA rate effectively without obviously hurting keyword recognition accuracy. Specifically, we can get impressively 0.13 FA per hour on a challenging dataset, with over 90 based detection model, while keyword recognition accuracy only drops less than 2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2020

A Transformer-based Audio Captioning Model with Keyword Estimation

One of the problems with automated audio captioning (AAC) is the indeter...
research
04/06/2023

To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement

Keyword spotting systems continuously process audio streams to detect ke...
research
11/21/2017

Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio

We propose using cascaded classifiers for a keyword spotting (KWS) task ...
research
05/21/2023

DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

Real-world complex acoustic environments especially the ones with a low ...
research
12/03/2021

BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge

This paper introduces the system submitted by the Yidun NISP team to the...
research
03/31/2021

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Auto-KWS 2021 challenge calls for automated machine learning (AutoML) so...
research
07/04/2022

Minimizing Sequential Confusion Error in Speech Command Recognition

Speech command recognition (SCR) has been commonly used on resource cons...

Please sign up or login with your details

Forgot password? Click here to reset