Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

06/03/2019
by   Joan Serrà, et al.
0

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations. Voice conversion, in which a model has to impersonate a speaker in a recording, is one of those situations. In this paper, we propose Blow, a single-scale normalizing flow using hypernetwork conditioning to perform many-to-many voice conversion between raw audio. Blow is trained end-to-end, with non-parallel data, on a frame-by-frame basis using a single speaker identifier. We show that Blow compares favorably to existing flow-based architectures and other competitive baselines, obtaining equal or better performance in both objective and subjective evaluations. We further assess the impact of its main components with an ablation study, and quantify a number of properties such as the necessary amount of training data or the preference for source or target speakers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

NVC-Net: End-to-End Adversarial Voice Conversion

Voice conversion has gained increasing popularity in many applications o...
research
04/10/2019

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

Recently, voice conversion (VC) without parallel data has been successfu...
research
09/16/2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

This paper integrates graph-to-sequence into an end-to-end text-to-speec...
research
02/27/2023

A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Previous research has shown that established techniques for spoken voice...
research
10/22/2020

Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization

Many-to-many voice conversion with non-parallel training data has seen s...
research
04/15/2021

Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels

This paper presents a end-to-end framework for the F0 transformation in ...
research
10/08/2020

FastVC: Fast Voice Conversion with non-parallel data

This paper introduces FastVC, an end-to-end model for fast Voice Convers...

Please sign up or login with your details

Forgot password? Click here to reset