SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech

11/14/2022
by   Perry Lam, et al.
0

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Inspired by these results, we propose training TTS models using a decaying sparsity rate, i.e. a high initial sparsity to accelerate training first, followed by a progressive rate reduction to obtain better eventual performance. This decremental approach differs from current methods of incrementing sparsity to a desired target, which costs significantly more time than dense training. We call our method SNIPER training: Single-shot Initialization Pruning Evolving-Rate training. Our experiments on FastSpeech2 show that although we were only able to obtain better losses in the first few epochs before being overtaken by the baseline, the final SNIPER-trained models beat constant-sparsity models and pip dense models in performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Neural models are known to be over-parameterized, and recent work has sh...
research
07/15/2016

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Modern deep neural networks have a large number of parameters, making th...
research
04/04/2022

APP: Anytime Progressive Pruning

With the latest advances in deep learning, there has been a lot of focus...
research
06/16/2020

Progressive Skeletonization: Trimming more fat from a network at initialization

Recent studies have shown that skeletonization (pruning parameters) of n...
research
06/12/2020

Dynamic Model Pruning with Feedback

Deep neural networks often have millions of parameters. This can hinder ...
research
06/02/2023

Task-Agnostic Structured Pruning of Speech Representation Models

Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM h...
research
10/04/2021

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

Are end-to-end text-to-speech (TTS) models over-parametrized? To what ex...

Please sign up or login with your details

Forgot password? Click here to reset