Can Contextual Biasing Remain Effective with Whisper and GPT-2?

06/02/2023
by   Guangzhi Sun, et al.
0

End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite the large amount of training data, infrequent content words that occur in a particular task may still exhibit poor ASR performance, with contextual biasing a possible remedy. This paper investigates the effectiveness of neural contextual biasing for Whisper combined with GPT-2. Specifically, this paper proposes integrating an adapted tree-constrained pointer generator (TCPGen) component for Whisper and a dedicated training scheme to dynamically adjust the final output without modifying any Whisper model parameters. Experiments across three datasets show a considerable reduction in errors on biasing words with a biasing list of 1000 words. Contextual biasing was more effective when applied to domain-specific data and can boost the performance of Whisper and GPT-2 without losing their generality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Incorporating biasing words obtained as contextual knowledge is critical...
research
09/01/2021

Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition

Contextual knowledge is important for real-world automatic speech recogn...
research
05/18/2022

Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator

Contextual knowledge is essential for reducing speech recognition errors...
research
05/30/2023

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

The incorporation of biasing words obtained through contextual knowledge...
research
11/03/2022

Probing Statistical Representations For End-To-End ASR

End-to-End automatic speech recognition (ASR) models aim to learn a gene...
research
01/17/2023

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

It is difficult for an end-to-end (E2E) ASR system to recognize words su...
research
09/01/2023

Contextual Biasing of Named-Entities with Large Language Models

This paper studies contextual biasing with Large Language Models (LLMs),...

Please sign up or login with your details

Forgot password? Click here to reset