Streaming Small-Footprint Keyword Spotting using Sequence-to-Sequence Models

10/26/2017
by   Yanzhang He, et al.
0

We develop streaming keyword spotting systems using a recurrent neural network transducer (RNN-T) model: an all-neural, end-to-end trained, sequence-to-sequence model which jointly learns acoustic and language model components. Our models are trained to predict either phonemes or graphemes as subword units, thus allowing us to detect arbitrary keyword phrases, without any out-of-vocabulary words. In order to adapt the models to the requirements of keyword spotting, we propose a novel technique which biases the RNN-T system towards a specific keyword of interest. Our systems are compared against a strong sequence-trained, connectionist temporal classification (CTC) based "keyword-filler" baseline, which is augmented with a separate phoneme language model. Overall, our RNN-T system with the proposed biasing technique significantly improves performance over the baseline system.

READ FULL TEXT
research
01/02/2018

Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer

We investigate training end-to-end speech recognition models with the re...
research
11/11/2022

Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

In this paper, we present a novel approach to adapt a sequence-to-sequen...
research
07/04/2016

Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation

Using neural networks to generate replies in human-computer dialogue sys...
research
11/01/2018

Sequence-to-sequence Models for Small-Footprint Keyword Spotting

In this paper, we propose a sequence-to-sequence model for keyword spott...
research
11/19/2018

Efficient keyword spotting using dilated convolutions and gating

We explore the application of end-to-end stateless temporal modeling to ...
research
05/29/2018

Can DNNs Learn to Lipread Full Sentences?

Finding visual features and suitable models for lipreading tasks that ar...
research
10/30/2019

Temporal Feedback Convolutional Recurrent Neural Networks for Keyword Spotting

While end-to-end learning has become a trend in deep learning, the model...

Please sign up or login with your details

Forgot password? Click here to reset