Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder

08/31/2023
by   Alexandre Bittar, et al.
0

Using a vision-inspired keyword spotting framework, we propose an architecture with input-dependent dynamic depth capable of processing streaming audio. Specifically, we extend a conformer encoder with trainable binary gates that allow us to dynamically skip network modules according to the input audio. Our approach improves detection and localization accuracy on continuous speech using Librispeech top-1000 most frequent words while maintaining a small memory footprint. The inclusion of gates also reduces the average amount of processing without affecting the overall performance. These benefits are shown to be even more pronounced using the Google speech commands dataset placed over background noise where up to 97 therefore making our method particularly interesting for an always-on keyword spotter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Encoder-Decoder Neural Architecture Optimization for Keyword Spotting

Keyword spotting aims to identify specific keyword audio utterances. In ...
research
05/21/2023

DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

Real-world complex acoustic environments especially the ones with a low ...
research
04/11/2022

Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness

It is critical for a keyword spotting model to have a small footprint as...
research
04/06/2023

To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement

Keyword spotting systems continuously process audio streams to detect ke...
research
04/21/2023

Small-footprint slimmable networks for keyword spotting

In this work, we present Slimmable Neural Networks applied to the proble...
research
12/31/2020

EfficientNet-Absolute Zero for Continuous Speech Keyword Spotting

Keyword spotting is a process of finding some specific words or phrases ...
research
04/14/2019

SpeechYOLO: Detection and Localization of Speech Objects

In this paper, we propose to apply object detection methods from the vis...

Please sign up or login with your details

Forgot password? Click here to reset