Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

03/26/2021
by   Isaac Elias, et al.
0

This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatically. Experimental results show that Parallel Tacotron 2 outperforms baselines in subjective naturalness in several diverse multi speaker evaluations. Its duration control capability is also demonstrated.

READ FULL TEXT
research
10/09/2021

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

Sequence expansion between encoder and decoder is a critical challenge i...
research
03/21/2022

Differentiable Duration Modeling for End-to-End Text-to-Speech

Parallel text-to-speech (TTS) models have recently enabled fast and high...
research
05/18/2020

MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search

To speed up the inference of neural speech synthesis, non-autoregressive...
research
10/22/2020

Parallel Tacotron: Non-Autoregressive and Controllable TTS

Although neural end-to-end text-to-speech models can synthesize highly n...
research
10/06/2021

Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS

Neural text-to-speech (TTS) synthesis can generate speech that is indist...
research
03/31/2020

Clustering of Arrivals in Queueing Systems: Autoregressive Conditional Duration Approach

Arrivals in queueing systems are typically assumed to be independent and...
research
07/28/2023

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

Recently, there has been a growing interest in text-to-speech (TTS) meth...

Please sign up or login with your details

Forgot password? Click here to reset