Continual Speaker Adaptation for Text-to-Speech Synthesis

03/26/2021
by   Hamed Hemati, et al.
19

Training a multi-speaker Text-to-Speech (TTS) model from scratch is computationally expensive and adding new speakers to the dataset requires the model to be re-trained. The naive solution of sequential fine-tuning of a model for new speakers can cause the model to have poor performance on older speakers. This phenomenon is known as catastrophic forgetting. In this paper, we look at TTS modeling from a continual learning perspective where the goal is to add new speakers without forgetting previous speakers. Therefore, we first propose an experimental setup and show that serial fine-tuning for new speakers can result in the forgetting of the previous speakers. Then we exploit two well-known techniques for continual learning namely experience replay and weight regularization and we reveal how one can mitigate the effect of degradation in speech synthesis diversity in sequential training of new speakers using these methods. Finally, we present a simple extension to improve the results in extreme setups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2022

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

Fine-tuning is a popular method for adapting text-to-speech (TTS) models...
research
07/14/2023

Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition

While Automatic Speech Recognition (ASR) models have shown significant a...
research
07/19/2022

Don't Stop Learning: Towards Continual Learning for the CLIP Model

The Contrastive Language-Image Pre-training (CLIP) Model is a recently p...
research
08/17/2023

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Catastrophic forgetting (CF) is a phenomenon that occurs in machine lear...
research
10/27/2022

Segmentation of Multiple Sclerosis Lesions across Hospitals: Learn Continually or Train from Scratch?

Segmentation of Multiple Sclerosis (MS) lesions is a challenging problem...
research
04/15/2021

Continual Learning for Fake Audio Detection

Fake audio attack becomes a major threat to the speaker verification sys...
research
10/14/2021

FedSpeech: Federated Text-to-Speech with Continual Learning

Federated learning enables collaborative training of machine learning mo...

Please sign up or login with your details

Forgot password? Click here to reset