GANtron: Emotional Speech Synthesis with Generative Adversarial Networks

10/06/2021
by   Enrique Hortal, et al.
0

Speech synthesis is used in a wide variety of industries. Nonetheless, it always sounds flat or robotic. The state of the art methods that allow for prosody control are very cumbersome to use and do not allow easy tuning. To tackle some of these drawbacks, in this work we target the implementation of a text-to-speech model where the inferred speech can be tuned with the desired emotions. To do so, we use Generative Adversarial Networks (GANs) together with a sequence-to-sequence model using an attention mechanism. We evaluate four different configurations considering different inputs and training strategies, study them and prove how our best model can generate speech files that lie in the same distribution as the initial training dataset. Additionally, a new strategy to boost the training convergence by applying a guided attention loss is proposed.

READ FULL TEXT
research
10/25/2018

Reducing over-smoothness in speech synthesis using Generative Adversarial Networks

Speech synthesis is widely used in many practical applications. In recen...
research
02/03/2023

Leveraging Contaminated Datasets to Learn Clean-Data Distribution with Purified Generative Adversarial Networks

Generative adversarial networks (GANs) are known for their strong abilit...
research
04/16/2019

Expediting TTS Synthesis with Adversarial Vocoding

Recent approaches in text-to-speech (TTS) synthesis employ neural networ...
research
03/14/2019

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

Recent studies have shown that text-to-speech synthesis quality can be i...
research
11/17/2017

High-Resolution Deep Convolutional Generative Adversarial Networks

Generative Adversarial Networks (GANs) convergence in a high-resolution ...
research
07/20/2020

Cross-View Image Synthesis with Deformable Convolution and Attention Mechanism

Learning to generate natural scenes has always been a daunting task in c...
research
06/05/2022

Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator

Understanding the underlying relationship between tongue and oropharynge...

Please sign up or login with your details

Forgot password? Click here to reset