Iteratively Improving Speech Recognition and Voice Conversion

05/24/2023
by   Mayank Kumar Singh, et al.
0

Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples. However, for the low-data resource domains, training a high-quality ASR remains to be a challenging task. In this work, we propose a novel iterative way of improving both the ASR and VC models. We first train an ASR model which is used to ensure content preservation while training a VC model. In the next iteration, the VC model is used as a data augmentation method to further fine-tune the ASR model and generalize it to diverse speakers. By iteratively leveraging the improved ASR model to train VC model and vice-versa, we experimentally show improvement in both the models. Our proposed framework outperforms the ASR and one-shot VC baseline models on English singing and Hindi speech domains in subjective and objective evaluations in low-data resource settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2022

Non-Parallel Voice Conversion for ASR Augmentation

Automatic speech recognition (ASR) needs to be robust to speaker differe...
research
03/31/2022

HiFi-VC: High Quality ASR-Based Voice Conversion

The goal of voice conversion (VC) is to convert input voice to match the...
research
09/16/2021

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Automatic lyrics transcription (ALT), which can be regarded as automatic...
research
07/20/2021

On Prosody Modeling for ASR+TTS based Voice Conversion

In voice conversion (VC), an approach showing promising results in the l...
research
03/24/2022

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

Non-parallel data voice conversion (VC) have achieved considerable break...
research
08/10/2021

StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition

Preserving the linguistic content of input speech is essential during vo...
research
06/15/2023

MobileASR: A resource-aware on-device personalisation framework for automatic speech recognition in mobile phones

We describe a comprehensive methodology for developing user-voice person...

Please sign up or login with your details

Forgot password? Click here to reset