Streaming non-autoregressive model for any-to-many voice conversion

06/15/2022
by   Ziyi Chen, et al.
0

Voice conversion models have developed for decades, and current mainstream research focuses on non-streaming voice conversion. However, streaming voice conversion is more suitable for practical application scenarios than non-streaming voice conversion. In this paper, we propose a streaming any-to-many voice conversion based on fully non-autoregressive model, which includes a streaming transformer based acoustic model and a streaming vocoder. Streaming transformer based acoustic model is composed of a pre-trained encoder from streaming end-to-end based automatic speech recognition model and a decoder modified on FastSpeech blocks. Streaming vocoder is designed for streaming task with pseudo quadrature mirror filter bank and causal convolution. Experimental results show that the proposed method achieves significant performance both in latency and conversion quality and can be real-time on CPU and GPU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

Voice conversion is an increasingly popular technology, and the growing ...
research
10/25/2022

Streaming Parrotron for on-device speech-to-speech conversion

We present a fully on-device and streaming Speech-To-Speech (STS) conver...
research
10/06/2022

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

End-to-end models have gradually become the main technical stream for vo...
research
05/14/2020

Streaming keyword spotting on mobile devices

In this work we explore the latency and accuracy of keyword spotting (KW...
research
06/29/2021

N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement

Recently, end-to-end Korean singing voice systems have been designed to ...
research
12/27/2019

MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating Mechanism for Accelerating Online Computation

With the recent advancements of deep learning technologies, the performa...
research
11/12/2021

AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic p...

Please sign up or login with your details

Forgot password? Click here to reset