Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

11/12/2020 ∙ by Hamed Hemati, et al. ∙ 0

Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them towards a new speaker or a new language is not as straight-forward in a low-resource setup. In this paper, we show that by applying minor changes to a Tacotron model, one can transfer an existing TTS model for a new speaker with the same or a different language using only 20 minutes of data. For this purpose, we first introduce a baseline multi-lingual Tacotron with language-agnostic input, then show how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.