Swiss German Speech to Text system evaluation

07/01/2022
by   Yanick Schraner, et al.
0

We present an in-depth evaluation of four commercially available Speech-to-Text (STT) systems for Swiss German. The systems are anonymized and referred to as system a-d in this report. We compare the four systems to our STT model, referred to as FHNW from hereon after, and provide details on how we trained our model. To evaluate the models, we use two STT datasets from different domains. The Swiss Parliament Corpus (SPC) test set and a private dataset in the news domain with an even distribution across seven dialect regions. We provide a detailed error analysis to detect the three systems' strengths and weaknesses. This analysis is limited by the characteristics of the two test sets. Our model scored the highest bilingual evaluation understudy (BLEU) on both datasets. On the SPC test set, we obtain a BLEU score of 0.607, whereas the best commercial system reaches a BLEU score of 0.509. On our private test set, we obtain a BLEU score of 0.722 and the best commercial system a BLEU score of 0.568.

READ FULL TEXT
research
01/17/2023

2nd Swiss German Speech to Standard German Text Shared Task at SwissText 2022

We present the results and findings of the 2nd Swiss German speech to St...
research
05/30/2023

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swi...
research
05/19/2022

SDS-200: A Swiss German Speech to Standard German Text Corpus

We present SDS-200, a corpus of Swiss German dialectal speech with Stand...
research
12/15/2014

A Broadcast News Corpus for Evaluation and Tuning of German LVCSR Systems

Transcription of broadcast news is an interesting and challenging applic...
research
11/24/2022

German Phoneme Recognition with Text-to-Phoneme Data Augmentation

In this study, we experimented to examine the effect of adding the most ...
research
11/29/2021

Improving random walk rankings with feature selection and imputation

The Science4cast Competition consists of predicting new links in a seman...
research
09/01/2018

Microsoft's Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data

This paper describes the Microsoft submission to the WMT2018 news transl...

Please sign up or login with your details

Forgot password? Click here to reset