Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

04/12/2021
by   Nick Rossenbach, et al.
7

Recent publications on automatic-speech-recognition (ASR) have a strong focus on attention encoder-decoder (AED) architectures which work well for large datasets, but tend to overfit when applied in low resource scenarios. One solution to tackle this issue is to generate synthetic data with a trained text-to-speech system (TTS) if additional text is available. This was successfully applied in many publications with AED systems. We present a novel approach of silence correction in the data pre-processing for TTS systems which increases the robustness when training on corpora targeted for ASR applications. In this work we do not only show the successful application of synthetic data for AED systems, but also test the same method on a highly optimized state-of-the-art Hybrid ASR system and a competitive monophone based system using connectionist-temporal-classification (CTC). We show that for the later systems the addition of synthetic data only has a minor effect, but they still outperform the AED systems by a large margin on LibriSpeech-100h. We achieve a final word-error-rate of 3.3 clean/noisy test-sets, surpassing any previous state-of-the-art systems that do not include unlabeled audio data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2023

Boosting Norwegian Automatic Speech Recognition

In this paper, we present several baselines for automatic speech recogni...
research
07/17/2020

CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition

Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrate...
research
06/14/2021

SynthASR: Unlocking Synthetic Data for Speech Recognition

End-to-end (E2E) automatic speech recognition (ASR) models have recently...
research
09/05/2023

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

In recent research, in the domain of speech processing, large End-to-End...
research
06/01/2023

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

This paper presents a novel algorithm for building an automatic speech r...
research
03/27/2023

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Adapting generic speech recognition models to specific individuals is a ...
research
03/23/2023

Enhancing Unsupervised Speech Recognition with Diffusion GANs

We enhance the vanilla adversarial training method for unsupervised Auto...

Please sign up or login with your details

Forgot password? Click here to reset