Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

11/11/2022
by   Beltrán Labrador, et al.
0

In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token <kw> and training the system to detect the <kw> token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to make our approach more suitable for the KWS task. Furthermore, we introduce a specific keyword spotting loss by adapting the sequence-discriminative Minimum Bayes-Risk training technique. We find that our approach significantly outperforms ASR based KWS systems. When compared with a conventional keyword spotting system, our proposal has similar performance while bringing the advantages and flexibility of sequence-to-sequence training. Additionally, when combined with the conventional KWS system, our approach can improve the performance at any operation point.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2017

Streaming Small-Footprint Keyword Spotting using Sequence-to-Sequence Models

We develop streaming keyword spotting systems using a recurrent neural n...
research
11/07/2018

Promising Accurate Prefix Boosting for sequence-to-sequence ASR

In this paper, we present promising accurate prefix boosting (PAPB), a d...
research
07/04/2016

Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation

Using neural networks to generate replies in human-computer dialogue sys...
research
11/01/2018

Sequence-to-sequence Models for Small-Footprint Keyword Spotting

In this paper, we propose a sequence-to-sequence model for keyword spott...
research
03/20/2020

TNT-KID: Transformer-based Neural Tagger for Keyword Identification

With growing amounts of available textual data, development of algorithm...
research
07/08/2019

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the...
research
04/13/2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

This paper introduces a novel Token-and-Duration Transducer (TDT) archit...

Please sign up or login with your details

Forgot password? Click here to reset