Text-to-Speech Pipeline for Swiss German – A comparison

05/31/2023
by   Tobias Bollinger, et al.
0

In this work, we studied the synthesis of Swiss German speech using different Text-to-Speech (TTS) models. We evaluated the TTS models on three corpora, and we found, that VITS models performed best, hence, using them for further testing. We also introduce a new method to evaluate TTS models by letting the discriminator of a trained vocoder GAN model predict whether a given waveform is human or synthesized. In summary, our best model delivers speech synthesis for different Swiss German dialects with previously unachieved quality.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset