DeepAI AI Chat
Log In Sign Up

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

03/26/2021
by   Isaac Elias, et al.
0

This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatically. Experimental results show that Parallel Tacotron 2 outperforms baselines in subjective naturalness in several diverse multi speaker evaluations. Its duration control capability is also demonstrated.

READ FULL TEXT
10/09/2021

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

Sequence expansion between encoder and decoder is a critical challenge i...
03/21/2022

Differentiable Duration Modeling for End-to-End Text-to-Speech

Parallel text-to-speech (TTS) models have recently enabled fast and high...
05/18/2020

MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search

To speed up the inference of neural speech synthesis, non-autoregressive...
10/22/2020

Parallel Tacotron: Non-Autoregressive and Controllable TTS

Although neural end-to-end text-to-speech models can synthesize highly n...
03/31/2020

Clustering of Arrivals in Queueing Systems: Autoregressive Conditional Duration Approach

Arrivals in queueing systems are typically assumed to be independent and...
10/06/2021

Emphasis control for parallel neural TTS

The semantic information conveyed by a speech signal is strongly influen...
01/02/2018

Cautionary note on "Semiparametric modeling of grouped current duration data with preferential reporting'"

This report is designed to clarify a few points about the article "Semip...