Mel-spectrogram augmentation for sequence to sequence voice conversion

01/06/2020
by   Yeongtae Hwang, et al.
0

When training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech tuples which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on the sequence-to-sequence voice conversion model. For Mel-spectrogram augmentation, we adopted the policies proposed in SpecAugment. In addition, we propose new policies for more data variations. To find the optimal hyperparameters of augmentation policies for voice conversion, we experimented based on the new metric, namely deformation per deteriorating ratio. We observed the effect of these through experiments based on various sizes of training set and combinations of augmentation policy. In the experimental results, the time axis warping based policies showed better performance than other policies.

READ FULL TEXT
research
03/29/2022

An Overview Analysis of Sequence-to-Sequence Emotional Voice Conversion

Emotional voice conversion (EVC) focuses on converting a speech utteranc...
research
12/21/2020

Pattern-aware Data Augmentation for Query Rewriting in Voice Assistant Systems

Query rewriting (QR) systems are widely used to reduce the friction caus...
research
10/19/2022

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater...
research
11/20/2018

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

This paper presents methods of making using of text supervision to impro...
research
07/25/2022

Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

Sequence-to-Sequence Text-to-Speech architectures that directly generate...
research
12/14/2019

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC...

Please sign up or login with your details

Forgot password? Click here to reset