Contextual RNN-T For Open Domain ASR

06/04/2020
by   Mahaveer Jain, et al.
0

End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a traditional hybrid ASR system - acoustic model, language model, pronunciation model - into a single neural network. While this has some nice advantages, it limits the system to be trained using only paired audio and text. Because of this, E2E models tend to have difficulties with correctly recognizing rare words that are not frequently seen during training, such as entity names. In this paper, we propose modifications to the RNN-T model that allow the model to utilize additional metadata text with the objective of improving performance on these named entity words. We evaluate our approach on an in-house dataset sampled from de-identified public social media videos, which represent an open domain ASR task. By using an attention model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 12 for videos with related metadata.

READ FULL TEXT
research
05/15/2020

Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

Videos uploaded on social media are often accompanied with textual descr...
research
11/05/2019

RNN-T For Latency Controlled ASR With Improved Beam Search

Neural transducer-based systems such as RNN Transducers (RNN-T) for auto...
research
07/10/2020

Class LM and word mapping for contextual biasing in End-to-End ASR

In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid i...
research
06/29/2022

Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems

End-2-end (E2E) models have become increasingly popular in some ASR task...
research
02/21/2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers

RNN-Transducer (RNN-T) models have become synonymous with streaming end-...
research
01/10/2022

A Likelihood Ratio based Domain Adaptation Method for E2E Models

End-to-end (E2E) automatic speech recognition models like Recurrent Neur...
research
11/05/2020

Improving RNN Transducer Based ASR with Auxiliary Tasks

End-to-end automatic speech recognition (ASR) models with a single neura...

Please sign up or login with your details

Forgot password? Click here to reset