The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

06/01/2023
by   Phat Do, et al.
0

We compare phone labels and articulatory features as input for cross-lingual transfer learning in text-to-speech (TTS) for low-resource languages (LRLs). Experiments with FastSpeech 2 and the LRL West Frisian show that using articulatory features outperformed using phone labels in both intelligibility and naturalness. For LRLs without pronunciation dictionaries, we propose two novel approaches: a) using a massively multilingual model to convert grapheme-to-phone (G2P) in both training and synthesizing, and b) using a universal phone recognizer to create a makeshift dictionary. Results show that the G2P approach performs largely on par with using a ground-truth dictionary and the phone recognition approach, while performing generally worse, remains a viable option for LRLs less suitable for the G2P approach. Within each approach, using articulatory features as input outperforms using phone labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2023

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

We compare using a PHOIBLE-based phone mapping method and using phonolog...
research
11/02/2021

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Speech processing systems currently do not support the vast majority of ...
research
02/26/2020

Universal Phone Recognition with a Multilingual Allophone System

Multilingual models can improve language processing, particularly for lo...
research
11/12/2021

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

We present a method for cross-lingual training an ASR system using absol...
research
07/31/2023

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input in...
research
05/01/2021

AlloST: Low-resource Speech Translation without Source Transcription

The end-to-end architecture has made promising progress in speech transl...
research
03/01/2021

Comparing acoustic analyses of speech data collected remotely

Face-to-face speech data collection has been next to impossible globally...

Please sign up or login with your details

Forgot password? Click here to reset