DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

05/21/2023
by   Ziqian Ning, et al.
0

Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities. Unlike typical (non-streaming) voice conversion, which can leverage the entire utterance as full context, streaming voice conversion faces significant challenges due to the missing future information, resulting in degraded intelligibility, speaker similarity, and sound quality. To address this challenge, we propose DualVC, a dual-mode neural voice conversion approach that supports both streaming and non-streaming modes using jointly trained separate network parameters. Furthermore, we propose intra-model knowledge distillation and hybrid predictive coding (HPC) to enhance the performance of streaming conversion. Additionally, we incorporate data augmentation to train a noise-robust autoregressive decoder, improving the model's performance on long-form speech conversion. Experimental results demonstrate that the proposed model outperforms the baseline models in the context of streaming voice conversion, while maintaining comparable performance to the non-streaming topline system that leverages the complete context, albeit with a latency of only 252.8 ms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2022

Streaming non-autoregressive model for any-to-many voice conversion

Voice conversion models have developed for decades, and current mainstre...
research
11/12/2021

AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic p...
research
08/31/2023

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Streaming automatic speech recognition (ASR) models are restricted from ...
research
05/18/2023

Data Augmentation for Diverse Voice Conversion in Noisy Environments

Voice conversion (VC) models have demonstrated impressive few-shot conve...
research
05/14/2020

Streaming keyword spotting on mobile devices

In this work we explore the latency and accuracy of keyword spotting (KW...
research
10/25/2022

Streaming Parrotron for on-device speech-to-speech conversion

We present a fully on-device and streaming Speech-To-Speech (STS) conver...
research
03/07/2022

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

In this paper, we propose to employ a dual-mode framework on the x-vecto...

Please sign up or login with your details

Forgot password? Click here to reset