Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

11/01/2022
by   Cheng-Ping Hsieh, et al.
0

Fine-tuning is a popular method for adapting text-to-speech (TTS) models to new speakers. However this approach has some challenges. Usually fine-tuning requires several hours of high quality speech per speaker. There is also that fine-tuning will negatively affect the quality of speech synthesis for previously learnt speakers. In this paper we propose an alternative approach for TTS adaptation based on using parameter-efficient adapter modules. In the proposed approach, a few small adapter modules are added to the original network. The original weights are frozen, and only the adapters are fine-tuned on speech for new speaker. The parameter-efficient fine-tuning approach will produce a new model with high level of parameter sharing with original model. Our experiments on LibriTTS, HiFi-TTS and VCTK datasets validate the effectiveness of adapter-based method through objective and subjective metrics.

READ FULL TEXT
research
10/28/2022

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

Adapting a neural text-to-speech (TTS) model to a target speaker typical...
research
03/26/2021

Continual Speaker Adaptation for Text-to-Speech Synthesis

Training a multi-speaker Text-to-Speech (TTS) model from scratch is comp...
research
03/22/2022

A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

Speech synthesis has come a long way as current text-to-speech (TTS) mod...
research
06/29/2021

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis

Recent advances in neural multi-speaker text-to-speech (TTS) models have...
research
05/29/2023

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

There are significant challenges for speaker adaptation in text-to-speec...
research
09/27/2018

Sample Efficient Adaptive Text-to-Speech

We present a meta-learning approach for adaptive text-to-speech (TTS) wi...
research
11/19/2021

Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control

This paper presents a method for phoneme-level prosody control of F0 and...

Please sign up or login with your details

Forgot password? Click here to reset