Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR

09/22/2019
by   Hirofumi Inaguma, et al.
0

Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) systems have attracted attention because of an extremely simplified architecture and fast decoding. To alleviate data sparseness issues due to infrequent words, the combination with an acoustic-to-character (A2C) model is investigated. Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words. A2W models learn contexts with both acoustic and transcripts; therefore they tend to falsely recognize OOV words as words in the vocabulary. In this paper, we tackle this problem by using external language models (LM), which are trained only with transcriptions and have better linguistic information to detect OOV words. The A2C model is used to resolve these OOV words. Experimental evaluations show that external LMs have the effects of not only reducing errors but also increasing the number of detected OOV words, and the proposed method significantly improves performances in English conversational and Japanese lecture corpora, especially for out-of-domain scenario. We also investigate the impact of the vocabulary size of A2W models and the data size for training LMs. Moreover, our approach can reduce the vocabulary size several times with marginal performance degradation.

READ FULL TEXT
research
03/03/2018

On Modular Training of Neural Acoustics-to-Word Model for LVCSR

End-to-end (E2E) automatic speech recognition (ASR) systems directly map...
research
07/13/2017

Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies

Today, the vocabulary size for language models in large vocabulary speec...
research
02/20/2023

Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

Due to the dynamic nature of human language, automatic speech recognitio...
research
06/27/2018

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

In automatic speech recognition (ASR) systems, recurrent neural network ...
research
11/28/2017

Acoustic-To-Word Model Without OOV

Recently, the acoustic-to-word model based on the Connectionist Temporal...
research
04/24/2023

Semantic Tokenizer for Enhanced Natural Language Processing

Traditionally, NLP performance improvement has been focused on improving...
research
06/12/2017

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

Speech recognition systems for irregularly-spelled languages like Englis...

Please sign up or login with your details

Forgot password? Click here to reset