On Prosody Modeling for ASR+TTS based Voice Conversion

07/20/2021
by   Wen-Chin Huang, et al.
0

In voice conversion (VC), an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents; these are then used as input by a text-to-speech (TTS) system to generate the converted speech. Such a paradigm, referred to as ASR+TTS, overlooks the modeling of prosody, which plays an important role in speech naturalness and conversion similarity. Although some researchers have considered transferring prosodic clues from the source speech, there arises a speaker mismatch during training and conversion. To address this issue, in this work, we propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP). We evaluate both methods on the VCC2020 benchmark and consider different linguistic representations. The results demonstrate the effectiveness of TTP in both objective and subjective evaluations.

READ FULL TEXT

page 5

page 6

research
09/03/2020

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

With the development of automatic speech recognition (ASR) and text-to-s...
research
03/31/2022

HiFi-VC: High Quality ASR-Based Voice Conversion

The goal of voice conversion (VC) is to convert input voice to match the...
research
08/10/2021

StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition

Preserving the linguistic content of input speech is essential during vo...
research
05/25/2022

An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech

The performance of child speech recognition is generally less satisfacto...
research
05/24/2023

Iteratively Improving Speech Recognition and Voice Conversion

Many existing works on voice conversion (VC) tasks use automatic speech ...
research
07/15/2019

Hierarchical Sequence to Sequence Voice Conversion with Limited Data

We present a voice conversion solution using recurrent sequence to seque...

Please sign up or login with your details

Forgot password? Click here to reset