Multi-task Learning with Cross Attention for Keyword Spotting

07/15/2021
by   Takuya Higuchi, et al.
0

Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multi-task learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention decoder summarizes information from a phonetic encoder by performing cross attention between the encoder outputs and a trainable query sequence to predict a confidence score for the KWS task. Experimental results on KWS tasks show that the proposed approach outperformed the conventional multi-task learning with split branches and a bi-directional long short-team memory decoder by 12

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2021

Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer

Transformers have recently become very popular for sequence-to-sequence ...
research
06/28/2022

Personalized Keyword Spotting through Multi-task Learning

Keyword spotting (KWS) plays an essential role in enabling speech-based ...
research
04/05/2022

Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning

As automatic speech recognition (ASR) systems are now being widely deplo...
research
11/01/2018

End-to-end Models with auditory attention in Multi-channel Keyword Spotting

In this paper, we propose an attention-based end-to-end model for multi-...
research
06/17/2019

Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos

Detecting manipulated images and videos is an important topic in digital...
research
07/10/2019

Multi-layer Attention Mechanism for Speech Keyword Recognition

As an important part of speech recognition technology, automatic speech ...
research
05/07/2020

Mutli-task Learning with Alignment Loss for Far-field Small-Footprint Keyword Spotting

In this paper, we focus on the task of small-footprint keyword spotting ...

Please sign up or login with your details

Forgot password? Click here to reset