Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

01/06/2023
by   David M. Chan, et al.
0

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning). In this work, we investigate the potential of leveraging external knowledge, particularly through off-policy key-value stores generated with text-to-speech methods, to allow for flexible post-training adaptation to new data distributions. In our approach, audio embeddings captured from text-to-speech, along with semantic text embeddings, are used to bias ASR via an approximate k-nearest-neighbor (KNN) based attentive fusion step. Our experiments on LibiriSpeech and in-house voice assistant/search datasets show that the proposed approach can reduce domain adaptation time by up to 1K GPU-hours while providing up to 3 fine-tuning baseline, suggesting a promising approach for adapting production ASR systems in challenging zero and few-shot scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2022

A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data

Automatic Speech Recognition(ASR) has been dominated by deep learning-ba...
research
06/28/2023

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

The integration of Language Models (LMs) has proven to be an effective w...
research
06/15/2022

Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

In this paper, we present our progress in pretraining Czech monolingual ...
research
07/18/2023

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

In this work, we propose a method to create domain-sensitive speech reco...
research
10/24/2022

ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition

Speech recognition applications cover a range of different audio and tex...
research
06/09/2023

Developing Speech Processing Pipelines for Police Accountability

Police body-worn cameras have the potential to improve accountability an...
research
09/21/2023

Sparsely Shared LoRA on Whisper for Child Speech Recognition

Whisper is a powerful automatic speech recognition (ASR) model. Neverthe...

Please sign up or login with your details

Forgot password? Click here to reset