Log In Sign Up

Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

by   Hamed Hemati, et al.

Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them towards a new speaker or a new language is not as straight-forward in a low-resource setup. In this paper, we show that by applying minor changes to a Tacotron model, one can transfer an existing TTS model for a new speaker with the same or a different language using only 20 minutes of data. For this purpose, we first introduce a baseline multi-lingual Tacotron with language-agnostic input, then show how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.


Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

The idea of using phonological features instead of phonemes as input to ...

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

This paper presents a method for end-to-end cross-lingual text-to-speech...

Exploiting Adapters for Cross-lingual Low-resource Speech Recognition

Cross-lingual speech adaptation aims to solve the problem of leveraging ...

Exploring Transfer Learning for Low Resource Emotional TTS

During the last few years, spoken language technologies have known a big...

Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching

Natural language processing (NLP) models trained on people-generated dat...

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pr...

MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

This report describes the NPU-HC speaker verification system submitted t...