Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

10/28/2022
by   Nobuyuki Morioka, et al.
0

Adapting a neural text-to-speech (TTS) model to a target speaker typically involves fine-tuning most if not all of the parameters of a pretrained multi-speaker backbone model. However, serving hundreds of fine-tuned neural TTS models is expensive as each of them requires significant footprint and separate computational resources (e.g., accelerators, memory). To scale speaker adapted neural TTS voices to hundreds of speakers while preserving the naturalness and speaker similarity, this paper proposes a parameter-efficient few-shot speaker adaptation, where the backbone model is augmented with trainable lightweight modules called residual adapters. This architecture allows the backbone model to be shared across different target speakers. Experimental results show that the proposed approach can achieve competitive naturalness and speaker similarity compared to the full fine-tuning approaches, while requiring only ∼0.1 speaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2022

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

Fine-tuning is a popular method for adapting text-to-speech (TTS) models...
research
05/29/2023

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

There are significant challenges for speaker adaptation in text-to-speec...
research
08/16/2021

GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints

Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system th...
research
09/27/2018

Sample Efficient Adaptive Text-to-Speech

We present a meta-learning approach for adaptive text-to-speech (TTS) wi...
research
03/05/2018

Linear networks based speaker adaptation for speech synthesis

Speaker adaptation methods aim to create fair quality synthesis speech v...
research
03/23/2022

A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization

Model fine-tuning and adaptation have become a common approach for model...
research
11/02/2021

OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression

Explicit deep generative models (DGMs), e.g., VAEs and Normalizing Flows...

Please sign up or login with your details

Forgot password? Click here to reset