Text-to-Speech Pipeline for Swiss German – A comparison

05/31/2023
by   Tobias Bollinger, et al.
0

In this work, we studied the synthesis of Swiss German speech using different Text-to-Speech (TTS) models. We evaluated the TTS models on three corpora, and we found, that VITS models performed best, hence, using them for further testing. We also introduce a new method to evaluate TTS models by letting the discriminator of a trained vocoder GAN model predict whether a given waveform is human or synthesized. In summary, our best model delivers speech synthesis for different Swiss German dialects with previously unachieved quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2023

2nd Swiss German Speech to Standard German Text Shared Task at SwissText 2022

We present the results and findings of the 2nd Swiss German speech to St...
research
06/11/2021

Sprachsynthese – State-of-the-Art in englischer und deutscher Sprache

Reading text aloud is an important feature for modern computer applicati...
research
05/19/2022

SDS-200: A Swiss German Speech to Standard German Text Corpus

We present SDS-200, a corpus of Swiss German dialectal speech with Stand...
research
06/12/2021

Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis

To date, various speech technology systems have adopted the vocoder appr...
research
10/15/2021

Scribosermo: Fast Speech-to-Text models for German and other Languages

Recent Speech-to-Text models often require a large amount of hardware re...
research
08/25/2021

Integrated Speech and Gesture Synthesis

Text-to-speech and co-speech gesture synthesis have until now been treat...
research
02/08/2023

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

Recent Text-to-Speech (TTS) systems trained on reading or acted corpora ...

Please sign up or login with your details

Forgot password? Click here to reset