Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

07/13/2023
by   Zeping Min, et al.
0

This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems to improve transcription accuracy. The increasing sophistication of LLMs, with their in-context learning capabilities and instruction-following behavior, has drawn significant attention in the field of Natural Language Processing (NLP). Our primary focus is to investigate the potential of using an LLM's in-context learning capabilities to enhance the performance of ASR systems, which currently face challenges such as ambient noise, speaker accents, and complex linguistic contexts. We designed a study using the Aishell-1 and LibriSpeech datasets, with ChatGPT and GPT-4 serving as benchmarks for LLM capabilities. Unfortunately, our initial experiments did not yield promising results, indicating the complexity of leveraging LLM's in-context learning for ASR applications. Despite further exploration with varied settings and models, the corrected sentences from the LLMs frequently resulted in higher Word Error Rates (WER), demonstrating the limitations of LLMs in speech applications. This paper provides a detailed overview of these experiments, their results, and implications, establishing that using LLMs' in-context learning capabilities to correct potential errors in speech recognition transcriptions is still a challenging task at the current stage.

READ FULL TEXT
research
06/02/2021

Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights

Automatic speech recognition (ASR) in Sanskrit is interesting, owing to ...
research
09/27/2021

Challenges and Opportunities of Speech Recognition for Bengali Language

Speech recognition is a fascinating process that offers the opportunity ...
research
01/26/2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec

"Transcription bottlenecks", created by a shortage of effective human tr...
research
09/18/2023

Instruction-Following Speech Recognition

Conventional end-to-end Automatic Speech Recognition (ASR) models primar...
research
02/07/2018

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems lack joint optimization durin...
research
10/21/2022

Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?

The usage of automatic speech recognition (ASR) systems are becoming omn...
research
09/13/2023

Can Whisper perform speech-based in-context learning

This paper investigates the in-context learning abilities of the Whisper...

Please sign up or login with your details

Forgot password? Click here to reset