Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

06/21/2023
by   Phat Do, et al.
0

We compare using a PHOIBLE-based phone mapping method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to test the language-independence of the methods and enhance the findings' applicability. We use Character Error Rates from automatic speech recognition and predicted Mean Opinion Scores for evaluation. Results show that both phone mapping and features input improve the output quality and the latter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) with a family tree-based distance measure as a criterion to select source languages in transfer learning. ASPF proves effective if label-based phone input is used, while the language distance does not have expected effects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

We compare phone labels and articulatory features as input for cross-lin...
research
01/26/2022

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

The high cost of data acquisition makes Automatic Speech Recognition (AS...
research
11/21/2019

Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin

We propose a system to develop a basic automatic speech recognizer(ASR) ...
research
05/01/2021

AlloST: Low-resource Speech Translation without Source Transcription

The end-to-end architecture has made promising progress in speech transl...
research
03/05/2021

Transfer Learning based Speech Affect Recognition in Urdu

It has been established that Speech Affect Recognition for low resource ...
research
06/19/2018

Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task

In this paper, we have investigated recurrent deep neural networks (DNNs...
research
08/06/2020

Evaluating computational models of infant phonetic learning across languages

In the first year of life, infants' speech perception becomes attuned to...

Please sign up or login with your details

Forgot password? Click here to reset