DeepAI AI Chat
Log In Sign Up

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

by   Isaac Elias, et al.

This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatically. Experimental results show that Parallel Tacotron 2 outperforms baselines in subjective naturalness in several diverse multi speaker evaluations. Its duration control capability is also demonstrated.


PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

Sequence expansion between encoder and decoder is a critical challenge i...

Differentiable Duration Modeling for End-to-End Text-to-Speech

Parallel text-to-speech (TTS) models have recently enabled fast and high...

MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search

To speed up the inference of neural speech synthesis, non-autoregressive...

Parallel Tacotron: Non-Autoregressive and Controllable TTS

Although neural end-to-end text-to-speech models can synthesize highly n...

Clustering of Arrivals in Queueing Systems: Autoregressive Conditional Duration Approach

Arrivals in queueing systems are typically assumed to be independent and...

Emphasis control for parallel neural TTS

The semantic information conveyed by a speech signal is strongly influen...

Cautionary note on "Semiparametric modeling of grouped current duration data with preferential reporting'"

This report is designed to clarify a few points about the article "Semip...