SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

04/25/2022
by   Zhenhui Ye, et al.
0

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech <cit.>. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sentence, then process the text encoding with a syntactic graph encoder to extract the syntactic information. 2) We incorporate the extracted syntactic encoding with PortaSpeech to improve the prosody prediction. 3) We introduce a multi-length discriminator to replace the flow-based post-net in PortaSpeech, which simplifies the training pipeline and improves the inference speed, while keeping the naturalness of the generated audio. Experiments on three datasets not only show that the tree-structured syntactic information grants SyntaSpeech the ability to synthesize better audio with expressive prosody, but also demonstrate the generalization ability of SyntaSpeech to adapt to multiple languages and multi-speaker text-to-speech. Ablation studies demonstrate the necessity of each component in SyntaSpeech. Source code and audio samples are available at https://syntaspeech.github.io

READ FULL TEXT

page 4

page 6

page 9

research
09/16/2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

This paper integrates graph-to-sequence into an end-to-end text-to-speec...
research
05/18/2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Improving text representation has attracted much attention to achieve ex...
research
06/13/2023

PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling

Although text-to-speech (TTS) systems have significantly improved, most ...
research
12/13/2020

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

Syntactic structure of a sentence text is correlated with the prosodic s...
research
06/17/2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

This paper introduces WaveGrad 2, a non-autoregressive generative model ...
research
12/11/2019

Quality of syntactic implication of RL-based sentence summarization

Work on summarization has explored both reinforcement learning (RL) opti...
research
11/07/2019

Transition-Based Deep Input Linearization

Traditional methods for deep NLG adopt pipeline approaches comprising st...

Please sign up or login with your details

Forgot password? Click here to reset