Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

05/18/2020
by   Tingzhi Mao, et al.
0

In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance. The underrepresented words correspond to rare or out-of-vocabulary (OOV) words in the training data, and thereby can't be modeled reliably. We begin with graphemic lexicon which allows to drop the necessity of phonetic models in hybrid ASR. We study it under different settings and demonstrate its effectiveness in dealing with underrepresented NEs. Next, we study the impact of neural language model (LM) with letter-based features derived to handle infrequent words. After that, we attempt to enrich representations of underrepresented NEs in pretrained neural LM by borrowing the embedding representations of rich-represented words. This let us gain significant performance improvement on underrepresented NE recognition. Finally, we boost the likelihood scores of utterances containing NEs in the word lattices rescored by neural LMs and gain further performance improvement. The combination of the aforementioned approaches improves NE recognition by up to 42

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance

Automatic speech recognition (ASR) for under-represented named-entity (U...
research
01/17/2023

Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam

In a hybrid automatic speech recognition (ASR) system, a pronunciation l...
research
02/20/2023

Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

Due to the dynamic nature of human language, automatic speech recognitio...
research
09/10/2021

Remember the context! ASR slot error correction through memorization

Accurate recognition of slot values such as domain specific words or nam...
research
10/30/2022

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

In embedding-matching acoustic-to-word (A2W) ASR, every word in the voca...
research
05/18/2022

Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator

Contextual knowledge is essential for reducing speech recognition errors...
research
12/19/2019

Statistical Testing on ASR Performance via Blockwise Bootstrap

A common question being raised in automatic speech recognition (ASR) eva...

Please sign up or login with your details

Forgot password? Click here to reset