Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

05/15/2020
by   Da-Rong Liu, et al.
0

Videos uploaded on social media are often accompanied with textual descriptions. In building automatic speech recognition (ASR) systems for videos, we can exploit the contextual information provided by such video metadata. In this paper, we explore ASR lattice rescoring by selectively attending to the video descriptions. We first use an attention based method to extract contextual vector representations of video metadata, and use these representations as part of the inputs to a neural language model during lattice rescoring. Secondly, we propose a hybrid pointer network approach to explicitly interpolate the word probabilities of the word occurrences in metadata. We perform experimental evaluations on both language modeling and ASR tasks, and demonstrate that both proposed methods provide performance improvements by selectively leveraging the video metadata.

READ FULL TEXT
research
06/04/2020

Contextual RNN-T For Open Domain ASR

End-to-end (E2E) systems for automatic speech recognition (ASR), such as...
research
01/06/2022

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Despite the rapid progress of end-to-end (E2E) automatic speech recognit...
research
10/23/2020

Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance

Automatic speech recognition (ASR) for under-represented named-entity (U...
research
11/18/2021

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

It is well known that many machine learning systems demonstrate bias tow...
research
04/06/2021

LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

Neural network-based language models are commonly used in rescoring appr...
research
06/13/2023

Large-scale Language Model Rescoring on Long-form Data

In this work, we study the impact of Large-scale Language Models (LLM) o...
research
06/23/2023

Implementing contextual biasing in GPU decoder for online ASR

GPU decoding significantly accelerates the output of ASR predictions. Wh...

Please sign up or login with your details

Forgot password? Click here to reset