Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution

10/20/2020
by   Ximin Li, et al.
0

Keyword Spotting (KWS) plays a vital role in human-computer interaction for smart on-device terminals and service robots. It remains challenging to achieve the trade-off between small footprint and high accuracy for KWS task. In this paper, we explore the application of multi-scale temporal modeling to the small-footprint keyword spotting task. We propose a multi-branch temporal convolution module (MTConv), a CNN block consisting of multiple temporal convolution filters with different kernel sizes, which enriches temporal feature space. Besides, taking advantage of temporal and depthwise convolution, a temporal efficient neural network (TENet) is designed for KWS system. Based on the purposed model, we replace standard temporal convolution layers with MTConvs that can be trained for better performance. While at the inference stage, the MTConv can be equivalently converted to the base convolution architecture, so that no extra parameters and computational costs are added compared to the base model. The results on Google Speech Command Dataset show that one of our models trained with MTConv performs the accuracy of 96.8 only 100K parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

A Separable Temporal Convolution Neural Network with Attention for Small-Footprint Keyword Spotting

Keyword spotting (KWS) on mobile devices generally requires a small memo...
research
08/12/2021

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Keyword Spotting (KWS) remains challenging to achieve the trade-off betw...
research
08/01/2020

Neural ODE with Temporal Convolution and Time Delay Neural Networks for Small-Footprint Keyword Spotting

In this paper, we propose neural network models based on the neural ordi...
research
04/21/2023

Small-footprint slimmable networks for keyword spotting

In this work, we present Slimmable Neural Networks applied to the proble...
research
08/27/2021

Separable Temporal Convolution plus Temporally Pooled Attention for Lightweight High-performance Keyword Spotting

Keyword spotting (KWS) on mobile devices generally requires a small memo...
research
01/15/2022

ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting

Building efficient architecture in neural speech processing is paramount...
research
02/26/2021

The NPU System for the 2020 Personalized Voice Trigger Challenge

This paper describes the system developed by the NPU team for the 2020 p...

Please sign up or login with your details

Forgot password? Click here to reset