Personalizing ASR with limited data using targeted subset selection

10/10/2021
by   Mayank Kothyari, et al.
0

We study the task of personalizing ASR models to a target non-native speaker/accent while being constrained by a transcription budget on the duration of utterances selected from a large unlabelled corpus. We propose a subset selection approach using the recently proposed submodular mutual information functions, in which we identify a diverse set of utterances that match the target speaker/accent. This is specified through a few target utterances and achieved by modeling the relationship between the target subset and the selected subset using submodular mutual information functions. This method is applied at both the speaker and accent levels. We personalize the model by fine tuning it with utterances selected and transcribed from the unlabelled corpus. Our method is able to consistently identify utterances from the target speaker/accent using just speech features. We show that the targeted subset selection approach improves upon random sampling by as much as 2 (absolute) depending on the speaker and accent and is 2x to 4x more label-efficient compared to random sampling. We also compare with a skyline where we specifically pick from the target and our method generally outperforms the oracle in its selections.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2021

Error-driven Fixed-Budget ASR Personalization for Accented Speakers

We consider the task of personalizing ASR models while being constrained...
research
04/30/2021

Submodular Mutual Information for Targeted Data Subset Selection

With the rapid growth of data, it is becoming increasingly difficult to ...
research
06/26/2019

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

In this paper, we propose a novel auxiliary loss function for target-spe...
research
11/15/2018

Effect of data reduction on sequence-to-sequence neural TTS

Recent speech synthesis systems based on sampling from autoregressive ne...
research
12/03/2022

Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models

Self-supervised learning (SSL) has been able to leverage unlabeled data ...
research
12/03/2013

Test Set Selection using Active Information Acquisition for Predictive Models

In this paper, we consider active information acquisition when the predi...
research
11/09/2018

Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection

We consider technology-assisted mimicry attacks in the context of automa...

Please sign up or login with your details

Forgot password? Click here to reset