Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting

08/02/2018
by   Zhehuai Chen, et al.
0

Speech recognition is a sequence prediction problem. Besides employing various deep learning approaches for framelevel classification, sequence-level discriminative training has been proved to be indispensable to achieve the state-of-the-art performance in large vocabulary continuous speech recognition (LVCSR). However, keyword spotting (KWS), as one of the most common speech recognition tasks, almost only benefits from frame-level deep learning due to the difficulty of getting competing sequence hypotheses. The few studies on sequence discriminative training for KWS are limited for fixed vocabulary or LVCSR based methods and have not been compared to the state-of-the-art deep learning based KWS approaches. In this paper, a sequence discriminative training framework is proposed for both fixed vocabulary and unrestricted acoustic KWS. Sequence discriminative training for both sequence-level generative and discriminative models are systematically investigated. By introducing word-independent phone lattices or non-keyword blank symbols to construct competing hypotheses, feasible and efficient sequence discriminative training approaches are proposed for acoustic KWS. Experiments showed that the proposed approaches obtained consistent and significant improvement in both fixed vocabulary and unrestricted KWS tasks, compared to previous frame-level deep learning based acoustic KWS methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2016

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

We present results that show it is possible to build a competitive, grea...
research
08/02/2018

Linguistic Search Optimization for Deep Learning Based LVCSR

Recent advances in deep learning based large vocabulary con- tinuous spe...
research
10/18/2021

Efficient Sequence Training of Attention Models using Approximative Recombination

Sequence discriminative training is a great tool to improve the performa...
research
08/15/2017

Comparison of Decoding Strategies for CTC Acoustic Models

Connectionist Temporal Classification has recently attracted a lot of in...
research
10/30/2020

AudVowelConsNet: A Phoneme-Level Based Deep CNN Architecture for Clinical Depression Diagnosis

Depression is a common and serious mood disorder that negatively affects...
research
01/25/2020

Learning To Detect Keyword Parts And Whole By Smoothed Max Pooling

We propose smoothed max pooling loss and its application to keyword spot...
research
05/16/2019

Learning discriminative features in sequence training without requiring framewise labelled data

In this work, we try to answer two questions: Can deeply learned feature...

Please sign up or login with your details

Forgot password? Click here to reset