Log In Sign Up

CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition

by   Ludwig Kürzinger, et al.

Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrated the ability to outperform conventional hybrid DNN/ HMM ASR. Aside from architectural improvements in those systems, those models grew in terms of depth, parameters and model capacity. However, these models also require more training data to achieve comparable performance. In this work, we combine freely available corpora for German speech recognition, including yet unlabeled speech data, to a big dataset of over 1700h of speech data. For data preparation, we propose a two-stage approach that uses an ASR model pre-trained with Connectionist Temporal Classification (CTC) to boot-strap more training data from unsegmented or unlabeled training data. Utterances are then extracted from label probabilities obtained from the network trained with CTC to determine segment alignments. With this training data, we trained a hybrid CTC/attention Transformer model that achieves 12.8% WER on the Tuda-DE test set, surpassing the previous baseline of 14.4% of conventional hybrid DNN/HMM ASR.


page 1

page 2

page 3

page 4


Open Source Automatic Speech Recognition for German

High quality Automatic Speech Recognition (ASR) is a prerequisite for sp...

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

Recent work has designed methods to demonstrate that model updates in AS...

Are E2E ASR models ready for an industrial usage?

The Automated Speech Recognition (ASR) community experiences a major tur...

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

Deep learning enables the development of efficient end-to-end speech pro...

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

Recent publications on automatic-speech-recognition (ASR) have a strong ...

ASR in German: A Detailed Error Analysis

The amount of freely available systems for automatic speech recognition ...

Hard Sample Mining for the Improved Retraining of Automatic Speech Recognition

It is an effective way that improves the performance of the existing Aut...

Code Repositories