Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

08/12/2021
by   Li Wang, et al.
0

Keyword Spotting (KWS) remains challenging to achieve the trade-off between small footprint and high accuracy. Recently proposed metric learning approaches improved the generalizability of models for the KWS task, and 1D-CNN based KWS models have achieved the state-of-the-arts (SOTA) in terms of model size. However, for metric learning, due to data limitations, the speech anchor is highly susceptible to the acoustic environment and speakers. Also, we note that the 1D-CNN models have limited capability to capture long-term temporal acoustic features. To address the above problems, we propose to utilize text anchors to improve the stability of anchors. Furthermore, a new type of model (LG-Net) is exquisitely designed to promote long-short term acoustic feature modeling based on 1D-CNN and self-attention. Experiments are conducted on Google Speech Commands Dataset version 1 (GSCDv1) and 2 (GSCDv2). The results demonstrate that the proposed text anchor based metric learning method shows consistent improvements over speech anchor on representative CNN-based models. Moreover, our LG-Net model achieves SOTA accuracy of 97.67 datasets, respectively. It is encouraged to see that our lighter LG-Net with only 74k parameters obtains 96.82 accuracy on the GSCDv2.

READ FULL TEXT
research
10/20/2020

Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution

Keyword Spotting (KWS) plays a vital role in human-computer interaction ...
research
05/28/2022

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

In this paper, we investigated a speech augmentation based unsupervised ...
research
11/01/2022

Metric Learning for User-defined Keyword Spotting

The goal of this work is to detect new spoken terms defined by users. Wh...
research
05/18/2020

Metric Learning for Keyword Spotting

The goal of this work is to train effective representations for keyword ...
research
06/09/2020

Smooth Proxy-Anchor Loss for Noisy Metric Learning

Many industrial applications use Metric Learning as a way to circumvent ...
research
03/31/2022

Learning Decoupling Features Through Orthogonality Regularization

Keyword spotting (KWS) and speaker verification (SV) are two important t...
research
08/04/2018

Triplet Network with Attention for Speaker Diarization

In automatic speech processing systems, speaker diarization is a crucial...

Please sign up or login with your details

Forgot password? Click here to reset