The FruitShell French synthesis system at the Blizzard 2023 Challenge

09/01/2023
by   Xin Qi, et al.
0

This paper presents a French text-to-speech synthesis system for the Blizzard Challenge 2023. The challenge consists of two tasks: generating high-quality speech from female speakers and generating speech that closely resembles specific individuals. Regarding the competition data, we conducted a screening process to remove missing or erroneous text data. We organized all symbols except for phonemes and eliminated symbols that had no pronunciation or zero duration. Additionally, we added word boundary and start/end symbols to the text, which we have found to improve speech quality based on our previous experience. For the Spoke task, we performed data augmentation according to the competition rules. We used an open-source G2P model to transcribe the French texts into phonemes. As the G2P model uses the International Phonetic Alphabet (IPA), we applied the same transcription process to the provided competition data for standardization. However, due to compiler limitations in recognizing special symbols from the IPA chart, we followed the rules to convert all phonemes into the phonetic scheme used in the competition data. Finally, we resampled all competition audio to a uniform sampling rate of 16 kHz. We employed a VITS-based acoustic model with the hifigan vocoder. For the Spoke task, we trained a multi-speaker model and incorporated speaker information into the duration predictor, vocoder, and flow layers of the model. The evaluation results of our system showed a quality MOS score of 3.6 for the Hub task and 3.4 for the Spoke task, placing our system at an average level among all participating teams.

READ FULL TEXT

page 2

page 4

research
06/26/2019

RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

We present RUSLAN -- a new open Russian spoken language corpus for the t...
research
08/30/2023

The DeepZen Speech Synthesis System for Blizzard Challenge 2023

This paper describes the DeepZen text to speech (TTS) system for Blizzar...
research
10/29/2021

VRAIN-UPV MLLP's system for the Blizzard Challenge 2021

This paper presents the VRAIN-UPV MLLP's speech synthesis system for the...
research
11/10/2020

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

We explore pretraining strategies including choice of base corpus with t...
research
11/29/2022

Controllable speech synthesis by learning discrete phoneme-level prosodic representations

In this paper, we present a novel method for phoneme-level prosody contr...
research
10/22/2020

The NTU-AISG Text-to-speech System for Blizzard Challenge 2020

We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizza...
research
12/09/2018

Increase Apparent Public Speaking Fluency By Speech Augmentation

Fluent and confident speech is desirable to every speaker. But professio...

Please sign up or login with your details

Forgot password? Click here to reset