Large-scale Language Model Rescoring on Long-form Data

06/13/2023
by   Tongzhou Chen, et al.
0

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2022

Improving Speech Recognition for Indic Languages using Language Model

We study the effect of applying a language model (LM) on the output of A...
research
06/25/2022

TEVR: Improving Speech Recognition by Token Entropy Variance Reduction

This paper presents TEVR, a speech recognition model designed to minimiz...
research
06/20/2023

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Low-resource accented speech recognition is one of the important challen...
research
04/01/2022

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have ...
research
08/18/2020

Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation

False triggers in voice assistants are unintended invocations of the ass...
research
05/15/2020

Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

Videos uploaded on social media are often accompanied with textual descr...
research
10/21/2019

Signal Combination for Language Identification

Google's multilingual speech recognition system combines low-level acous...

Please sign up or login with your details

Forgot password? Click here to reset