GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis

10/23/2020
by   Rui Liu, et al.
0

Attention-based end-to-end text-to-speech synthesis (TTS) is superior to conventional statistical methods in many ways. Transformer-based TTS is one of such successful implementations. While Transformer TTS models the speech frame sequence well with a self-attention mechanism, it does not associate input text with output utterances from a syntactic point of view at sentence level. We propose a novel neural TTS model, denoted as GraphSpeech, that is formulated under graph neural network framework. GraphSpeech encodes explicitly the syntactic relation of input lexical tokens in a sentence, and incorporates such information to derive syntactically motivated character embeddings for TTS attention mechanism. Experiments show that GraphSpeech consistently outperforms the Transformer TTS baseline in terms of spectrum and prosody rendering of utterances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

The goal of semantic role labelling (SRL) is to recognise the predicate-...
research
07/02/2022

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Transformer-based models have demonstrated their effectiveness in automa...
research
11/01/2018

Hybrid Self-Attention Network for Machine Translation

The encoder-decoder is the typical framework for Neural Machine Translat...
research
10/21/2022

Syntax-guided Localized Self-attention by Constituency Syntactic Distance

Recent works have revealed that Transformers are implicitly learning the...
research
10/23/2019

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

Despite the ability to produce human-level speech for in-domain text, at...
research
11/17/2020

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

Neural end-to-end text-to-speech (TTS) , which adopts either a recurrent...
research
01/30/2021

Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet

In this work, a robust and efficient text-to-speech system, named Triple...

Please sign up or login with your details

Forgot password? Click here to reset