Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

06/08/2023
by   Mingqiu Wang, et al.
0

Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations. To bridge this gap, we propose a joint speech and language model (SLM) using a Speech2Text adapter, which maps speech into text token embedding space without speech information loss. Additionally, using a CTC-based blank-filtering, we can reduce the speech sequence length to that of text. In speech MultiWoz dataset (DSTC11 challenge), SLM largely improves the dialog state tracking (DST) performance (24.7 address errors on rare entities, we augment SLM with a Speech2Entity retriever, which uses speech to retrieve relevant entities, and then adds them to the original SLM input as a prefix. With this retrieval-augmented SLM (ReSLM), the DST performance jumps to 34.6 the dialog understanding task improves the ASR performance from 9.4 WER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors

Speech-based virtual assistants, such as Amazon Alexa, Google assistant,...
research
12/16/2022

Speech Aware Dialog System Technology Challenge (DSTC11)

Most research on task oriented dialog modeling is based on written text ...
research
09/22/2017

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

We apply sequence-to-sequence model to mitigate the impact of speech rec...
research
06/22/2023

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding ...
research
06/13/2019

Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise

Speech processing systems rely on robust feature extraction to handle ph...
research
08/11/2023

Improving Joint Speech-Text Representations Without Alignment

The last year has seen astonishing progress in text-prompted image gener...
research
03/29/2022

The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through

Language models are increasingly becoming popular in AI-powered scientif...

Please sign up or login with your details

Forgot password? Click here to reset