Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

11/07/2022
by   Jan Melechovsky, et al.
0

Accent plays a significant role in speech communication, influencing understanding capabilities and also conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's speech that is converted to any desired target accent. Our thorough experiments validate the effectiveness of our proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the ability to manipulate accents in the synthesized speech and provide a promising avenue for future accented TTS research.

READ FULL TEXT
research
08/25/2023

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

Neural networks have been able to generate high-quality single-sentence ...
research
03/31/2022

Manipulation of oral cancer speech using neural articulatory synthesis

We present an articulatory synthesis framework for the synthesis and man...
research
06/30/2022

TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder

Recent advances in synthetic speech quality have enabled us to train tex...
research
06/26/2023

A Conditional Flow Variational Autoencoder for Controllable Synthesis of Virtual Populations of Anatomy

Generating virtual populations (VPs) of anatomy is essential for conduct...
research
05/17/2019

CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

The prosodic aspects of speech signals produced by current text-to-speec...
research
11/18/2021

Transformer-S2A: Robust and Efficient Speech-to-Animation

We propose a novel robust and efficient Speech-to-Animation (S2A) approa...
research
09/10/2023

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

Although diffusion models in text-to-speech have become a popular choice...

Please sign up or login with your details

Forgot password? Click here to reset