Adversarial Training For Low-Resource Disfluency Correction

06/10/2023
by   Vineet Bhat, et al.
0

Disfluencies commonly occur in conversational speech. Speech with disfluencies can result in noisy Automatic Speech Recognition (ASR) transcripts, which affects downstream tasks like machine translation. In this paper, we propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) that utilizes a small amount of labeled real disfluent data in conjunction with a large amount of unlabeled data. We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages- Bengali, Hindi, and Marathi (all from the Indo-Aryan family). Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments. We achieve an average 6.15 points improvement in F1-score over competitive baselines across all three languages mentioned. To the best of our knowledge, we are the first to utilize adversarial training for DC and use it to correct stuttering disfluencies in English, establishing a new benchmark for this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction

Conversational speech often consists of deviations from the speech plan,...
research
05/19/2020

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

It is important to transcribe and archive speech data of endangered lang...
research
03/02/2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

We introduce the Universal Speech Model (USM), a single large model that...
research
10/26/2022

Improving Speech-to-Speech Translation Through Unlabeled Text

Direct speech-to-speech translation (S2ST) is among the most challenging...
research
04/23/2021

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Self-Supervised Learning (SSL) using huge unlabeled data has been succes...

Please sign up or login with your details

Forgot password? Click here to reset