Acoustic-To-Word Model Without OOV

11/28/2017
by   Jinyu Li, et al.
0

Recently, the acoustic-to-word model based on the Connectionist Temporal Classification (CTC) criterion was shown as a natural end-to-end model directly targeting words as output units. However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Therefore, such word-based CTC model can only recognize the frequent words modeled by the network output nodes. It also cannot easily handle the hot-words which emerge after the model is trained. In this study, we improve the acoustic-to-word model with a hybrid CTC model which can predict both words and characters at the same time. With a shared-hidden-layer structure and modular design, the alignments of words generated from the word-based CTC and the character-based CTC are synchronized. Whenever the acoustic-to-word model emits an OOV token, we back off that OOV segment to the word output generated from the character-based CTC, hence solving the OOV or hot-words issue. Evaluated on a Microsoft Cortana voice assistant task, the proposed model can reduce the errors introduced by the OOV output token in the acoustic-to-word model by 30

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2018

Advancing Acoustic-to-Word CTC Model

The acoustic-to-word model based on the connectionist temporal classific...
research
12/31/2018

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

The acoustic-to-word model based on the Connectionist Temporal Classific...
research
07/23/2018

Acoustic-to-Word Recognition with Sequence-to-Sequence Models

Acoustic-to-Word recognition provides a straightforward solution to end-...
research
09/22/2019

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR

Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) sys...
research
12/05/2017

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

For decades, context-dependent phonemes have been the dominant sub-word ...
research
10/21/2022

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

Accurate prediction of the user intent to interact with a voice assistan...
research
12/20/2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

What are the units of text that we want to model? From bytes to multi-wo...

Please sign up or login with your details

Forgot password? Click here to reset