Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations

06/02/2022
by   Chang Liu, et al.
0

This paper proposes a multilingual speech synthesis method which combines unsupervised phonetic representations (UPR) and supervised phonetic representations (SPR) to avoid reliance on the pronunciation dictionaries of target languages. In this method, a pretrained wav2vec 2.0 model is adopted to extract UPRs and a language-independent automatic speech recognition (LI-ASR) model is built with a connectionist temporal classification (CTC) loss to extract segment-level SPRs from the audio data of target languages. Then, an acoustic model is designed, which first predicts UPRs and SPRs from texts separately and then combines the predicted UPRs and SPRs to generate mel-spectrograms. The results of our experiments on six languages show that the proposed method outperformed the methods that directly predicted mel-spectrograms from character or phoneme sequences and the ablated models that utilized only UPRs or SPRs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Scaling Speech Technology to 1,000+ Languages

Expanding the language coverage of speech technology has the potential t...
research
10/18/2022

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Training state-of-the-art Automated Speech Recognition (ASR) models typi...
research
11/22/2018

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio ...
research
05/16/2020

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Only a handful of the world's languages are abundant with the resources ...
research
11/03/2020

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

The present study tackles the problem of automatically discovering spoke...
research
10/28/2019

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

In this paper we propose a Sequential Representation Quantization AutoEn...
research
05/24/2022

Adaptive multilingual speech recognition with pretrained models

Multilingual speech recognition with supervised learning has achieved gr...

Please sign up or login with your details

Forgot password? Click here to reset