CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition

07/17/2020
by   Ludwig Kürzinger, et al.
0

Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrated the ability to outperform conventional hybrid DNN/ HMM ASR. Aside from architectural improvements in those systems, those models grew in terms of depth, parameters and model capacity. However, these models also require more training data to achieve comparable performance. In this work, we combine freely available corpora for German speech recognition, including yet unlabeled speech data, to a big dataset of over 1700h of speech data. For data preparation, we propose a two-stage approach that uses an ASR model pre-trained with Connectionist Temporal Classification (CTC) to boot-strap more training data from unsegmented or unlabeled training data. Utterances are then extracted from label probabilities obtained from the network trained with CTC to determine segment alignments. With this training data, we trained a hybrid CTC/attention Transformer model that achieves 12.8% WER on the Tuda-DE test set, surpassing the previous baseline of 14.4% of conventional hybrid DNN/HMM ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2018

Open Source Automatic Speech Recognition for German

High quality Automatic Speech Recognition (ASR) is a prerequisite for sp...
research
04/18/2022

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

Recent work has designed methods to demonstrate that model updates in AS...
research
12/09/2021

Are E2E ASR models ready for an industrial usage?

The Automated Speech Recognition (ASR) community experiences a major tur...
research
09/11/2020

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

Deep learning enables the development of efficient end-to-end speech pro...
research
05/25/2020

Adapting End-to-End Speech Recognition for Readable Subtitles

Automatic speech recognition (ASR) systems are primarily evaluated on tr...
research
04/12/2021

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

Recent publications on automatic-speech-recognition (ASR) have a strong ...
research
04/12/2022

ASR in German: A Detailed Error Analysis

The amount of freely available systems for automatic speech recognition ...

Please sign up or login with your details

Forgot password? Click here to reset