Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech

04/14/2021
by   Yixuan Zhou, et al.
0

Semantic information of a sentence is crucial for improving the expressiveness of a text-to-speech (TTS) system, but can not be well learned from the limited training TTS dataset just by virtue of the nowadays encoder structures. As large scale pre-trained text representation develops, bidirectional encoder representations from transformers (BERT) has been proven to embody text-context semantic information and applied to TTS as additional input. However BERT can not explicitly associate semantic tokens from point of dependency relations in a sentence. In this paper, to enhance expressiveness, we propose a semantic representation learning method based on graph neural network, considering dependency relations of a sentence. Dependency graph of input text is composed of edges from dependency tree structure considering both the forward and the reverse directions. Semantic representations are then extracted at word level by the relational gated graph network (RGGN) fed with features from BERT as nodes input. Upsampled semantic representations and character-level embeddings are concatenated to serve as the encoder input of Tacotron-2. Experimental results show that our proposed method outperforms the baseline using vanilla BERT features both in LJSpeech and Blizzard Challenge 2013 datasets, and semantic representations learned from the reverse direction are more effective for enhancing expressiveness.

READ FULL TEXT
research
12/13/2020

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

Syntactic structure of a sentence text is correlated with the prosodic s...
research
08/07/2023

Local Structure-aware Graph Contrastive Representation Learning

Traditional Graph Neural Network (GNN), as a graph representation learni...
research
10/07/2020

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Text-to-image multimodal tasks, generating/retrieving an image from a gi...
research
11/06/2020

Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis

Despite prosody is related to the linguistic information up to the disco...
research
05/31/2023

Catalysis distillation neural network for the few shot open catalyst challenge

The integration of artificial intelligence and science has resulted in s...
research
09/20/2023

CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

Unsupervised sentence representation learning aims to transform input se...
research
10/06/2021

NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Network for Better Cause-Effect Span Detection

Automatic identification of cause-effect spans in financial documents is...

Please sign up or login with your details

Forgot password? Click here to reset