Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

10/16/2019
by   Adrien Dufraux, et al.
0

The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions. Based on a noise model of transcription errors, Lead2Gold searches for better transcriptions of the training data with a beam search that takes this noise model into account. The beam search is differentiable and does not require a forced alignment step, thus the whole system is trained end-to-end. Lead2Gold can be viewed as a new loss function that can be used on top of any sequence-to-sequence deep neural network. We conduct proof-of-concept experiments on noisy transcriptions generated from letter corruptions with different noise levels. We show that Lead2Gold obtains a better ASR accuracy than a competitive baseline which does not account for the (artificially-introduced) transcription noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2018

Deep context: end-to-end contextual speech recognition

In automatic speech recognition (ASR) what a user says depends on the pa...
research
03/23/2021

Hallucination of speech recognition errors with sequence to sequence learning

Automatic Speech Recognition (ASR) is an imperfect process that results ...
research
02/02/2022

Error Correction in ASR using Sequence-to-Sequence Models

Post-editing in Automatic Speech Recognition (ASR) entails automatically...
research
09/05/2023

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

In recent research, in the domain of speech processing, large End-to-End...
research
04/19/2019

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

Sequence-to-sequence (S2S) modeling is becoming a popular paradigm for a...
research
01/22/2021

Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition

With computers getting more and more powerful and integrated in our dail...

Please sign up or login with your details

Forgot password? Click here to reset