ASR Bundestag: A Large-Scale political debate dataset in German

02/12/2023
by   Johannes Wirth, et al.
0

We present ASR Bundestag, a dataset for automatic speech recognition in German, consisting of 610 hours of aligned audio-transcript pairs for supervised training as well as 1,038 hours of unlabeled audio snippets for self-supervised learning, based on raw audio data and transcriptions from plenary sessions and committee meetings of the German parliament. In addition, we discuss utilized approaches for the automated creation of speech datasets and assess the quality of the resulting dataset based on evaluations and finetuning of a pre-trained state of the art model. We make the dataset publicly available, including all subsets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2021

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic spee...
research
04/08/2020

The Spotify Podcasts Dataset

Podcasts are a relatively new form of audio media. Episodes appear on a ...
research
05/20/2023

Self-supervised representations in speech-based depression detection

This paper proposes handling training data sparsity in speech-based auto...
research
03/30/2023

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Recently reported state-of-the-art results in visual speech recognition ...
research
02/24/2022

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

Training Transformer-based models demands a large amount of data, while ...
research
06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...

Please sign up or login with your details

Forgot password? Click here to reset