FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

09/16/2023
by   Jianzong Wang, et al.
0

This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic information of input text. Specifically, the input text is parsed by a dependency parsing module to form a syntactic graph. The syntactic graph is then encoded by a graph encoder to extract the syntactic hidden information, which is concatenated with phoneme embedding and input to the alignment and flow-based decoding modules to generate the raw audio waveform. The model is experimented on two languages, English and Mandarin, using single-speaker, few samples of target speakers, and multi-speaker datasets, respectively. Experimental results show better prosodic consistency performance between input text and generated audio, and also get higher scores in the subjective prosodic evaluation, and show the ability of voice conversion. Besides, the efficiency of the model is largely boosted through the design of the AI chip operator with 5x acceleration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has m...
research
06/03/2019

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

End-to-end models for raw audio generation are a challenge, specially if...
research
07/20/2021

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

Neural evaluation metrics derived for numerous speech generation tasks h...
research
03/04/2020

GraphTTS: graph-to-sequence modelling in neural text-to-speech

This paper leverages the graph-to-sequence method in neural text-to-spee...
research
05/22/2020

NAUTILUS: a Versatile Voice Cloning System

We introduce a novel speech synthesis system, called NAUTILUS, that can ...
research
07/07/2021

Effective and Differentiated Use of Control Information for Multi-speaker Speech Synthesis

In multi-speaker speech synthesis, data from a number of speakers usuall...
research
05/18/2023

a unified front-end framework for english text-to-speech synthesis

The front-end is a critical component of English text-to-speech (TTS) sy...

Please sign up or login with your details

Forgot password? Click here to reset