A Broadcast News Corpus for Evaluation and Tuning of German LVCSR Systems

12/15/2014
by   Felix Weninger, et al.
0

Transcription of broadcast news is an interesting and challenging application for large-vocabulary continuous speech recognition (LVCSR). We present in detail the structure of a manually segmented and annotated corpus including over 160 hours of German broadcast news, and propose it as an evaluation framework of LVCSR systems. We show our own experimental results on the corpus, achieved with a state-of-the-art LVCSR decoder, measuring the effect of different feature sets and decoding parameters, and thereby demonstrate that real-time decoding of our test set is feasible on a desktop PC at 9.2 error rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2022

SDS-200: A Swiss German Speech to Standard German Text Corpus

We present SDS-200, a corpus of Swiss German dialectal speech with Stand...
research
05/30/2023

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swi...
research
07/01/2022

Swiss German Speech to Text system evaluation

We present an in-depth evaluation of four commercially available Speech-...
research
04/30/2019

English Broadcast News Speech Recognition by Humans and Machines

With recent advances in deep learning, considerable attention has been g...
research
06/24/2021

QASR: QCRI Aljazeera Speech Resource – A Large Scale Annotated Arabic Speech Corpus

We introduce the largest transcribed Arabic speech corpus, QASR, collect...
research
07/27/2021

Emotion Stimulus Detection in German News Headlines

Emotion stimulus extraction is a fine-grained subtask of emotion analysi...
research
03/21/2020

A Joint Approach to Compound Splitting and Idiomatic Compound Detection

Applications such as machine translation, speech recognition, and inform...

Please sign up or login with your details

Forgot password? Click here to reset