Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

10/19/2022
by   Ding Ma, et al.
0

Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2019

Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data

This paper introduces Taco-VC, a novel architecture for voice conversion...
research
11/20/2018

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

This paper presents methods of making using of text supervision to impro...
research
01/06/2020

Mel-spectrogram augmentation for sequence to sequence voice conversion

When training the sequence-to-sequence voice conversion model, we need t...
research
04/10/2017

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Voice conversion (VC) using sequence-to-sequence learning of context pos...
research
10/15/2021

Towards Identity Preserving Normal to Dysarthric Voice Conversion

We present a voice conversion framework that converts normal speech into...
research
06/02/2021

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

We propose a new paradigm for maintaining speaker identity in dysarthric...
research
10/13/2016

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

In this paper, we propose a dictionary update method for Nonnegative Mat...

Please sign up or login with your details

Forgot password? Click here to reset