GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis

12/03/2020
by   Aolan Sun, et al.
0

This paper introduces a graphical representation approach of prosody boundary (GraphPB) in the task of Chinese speech synthesis, intending to parse the semantic and syntactic relationship of input sequences in a graphical domain for improving the prosody performance. The nodes of the graph embedding are formed by prosodic words, and the edges are formed by the other prosodic boundaries, namely prosodic phrase boundary (PPH) and intonation phrase boundary (IPH). Different Graph Neural Networks (GNN) like Gated Graph Neural Network (GGNN) and Graph Long Short-term Memory (G-LSTM) are utilised as graph encoders to exploit the graphical prosody boundary information. Graph-to-sequence model is proposed and formed by a graph encoder and an attentional decoder. Two techniques are proposed to embed sequential information into the graph-to-sequence text-to-speech model. The experimental results show that this proposed approach can encode the phonetic and prosody rhythm of an utterance. The mean opinion score (MOS) of these GNN models shows comparative results with the state-of-the-art sequence-to-sequence models with better performance in the aspect of prosody. This provides an alternative approach for prosody modelling in end-to-end speech synthesis.

READ FULL TEXT
research
03/04/2020

GraphTTS: graph-to-sequence modelling in neural text-to-speech

This paper leverages the graph-to-sequence method in neural text-to-spee...
research
06/19/2021

Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters

Vocoders received renewed attention as main components in statistical pa...
research
08/04/2020

Boundary Content Graph Neural Network for Temporal Action Proposal Generation

Temporal action proposal generation plays an important role in video act...
research
02/02/2019

FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation

Deep gated convolutional networks have been proved to be very effective ...
research
11/02/2015

Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features

Prosody affects the naturalness and intelligibility of speech. However, ...
research
03/06/2023

Wind Turbine Gearbox Fault Detection Based on Sparse Filtering and Graph Neural Networks

The wind energy industry has been experiencing tremendous growth and con...
research
10/24/2020

Road Accident Proneness Indicator Based On Time, Weather And Location Specificity Using Graph Neural Networks

In this paper, we present a novel approach to identify the Spatio-tempor...

Please sign up or login with your details

Forgot password? Click here to reset