Leveraging Large Language Models for Exploiting ASR Uncertainty

09/09/2023
by   Pranay Dighe, et al.
0

While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the accuracy of a fixed ASR system on the spoken input. Specifically, we tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent. Instead of chasing a high accuracy by designing complex or specialized architectures regardless of deployment costs, we seek to answer how far we can go without substantially changing the underlying ASR and LLM, which can potentially be shared by multiple unrelated tasks. To this end, we propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis. We explore prompt-engineering to explain the concept of n-best lists to the LLM; followed by the finetuning of Low-Rank Adapters on the downstream tasks. Our approach using n-best lists proves to be effective on a device-directed speech detection task as well as on a keyword spotting task, where systems using n-best list prompts outperform those using 1-best ASR hypothesis; thus paving the way for an efficient method to exploit ASR uncertainty via LLMs for speech-based applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2020

Improving Spoken Language Understanding By Exploiting ASR N-best Hypotheses

In a modern spoken language understanding (SLU) system, the natural lang...
research
11/03/2022

Hybrid-SD (H_SD): A new hybrid evaluation metric for automatic speech recognition tasks

Many studies have examined the shortcomings of word error rate (WER) as ...
research
04/05/2021

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Word Error Rate (WER) has been the predominant metric used to evaluate t...
research
04/09/2020

Improving Readability for Automatic Speech Recognition Transcription

Modern Automatic Speech Recognition (ASR) systems can achieve high perfo...
research
10/23/2019

Incremental Online Spoken Language Understanding

Spoken Language Understanding (SLU) typically comprises of an automatic ...
research
04/11/2021

Innovative Bert-based Reranking Language Models for Speech Recognition

More recently, Bidirectional Encoder Representations from Transformers (...
research
12/07/2020

Using multiple ASR hypotheses to boost i18n NLU performance

Current voice assistants typically use the best hypothesis yielded by th...

Please sign up or login with your details

Forgot password? Click here to reset