Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

09/18/2023
by   Hsuan Su, et al.
0

While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech synthesis model to generate the corresponding speech. We propose a simple yet effective in-context instruction finetuning strategy to increase the effectiveness of LLM in generating text corpora for new domains. Experiments on the SLURP dataset show that the proposed method achieves an average relative word error rate improvement of 28% on unseen target domains without any performance drop in source domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2023

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

The integration of Language Models (LMs) has proven to be an effective w...
research
04/27/2021

AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions

This paper addresses text recognition for domains with limited manual an...
research
11/09/2022

Adaptive Multi-Corpora Language Model Training for Speech Recognition

Neural network language model (NNLM) plays an essential role in automati...
research
03/09/2022

A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling

Automatic speech recognition (ASR) systems used on smart phones or vehic...
research
05/11/2023

Masked Audio Text Encoders are Effective Multi-Modal Rescorers

Masked Language Models (MLMs) have proven to be effective for second-pas...
research
03/27/2023

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Adapting generic speech recognition models to specific individuals is a ...
research
02/22/2023

MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

End-to-end automatic speech recognition (ASR) usually suffers from perfo...

Please sign up or login with your details

Forgot password? Click here to reset