GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

07/04/2022
by   Magdalena Proszewska, et al.
0

In this paper, we propose GlowVC: a multilingual multi-speaker flow-based model for language-independent text-free voice conversion. We build on Glow-TTS, which provides an architecture that enables use of linguistic features during training without the necessity of using them for VC inference. We consider two versions of our model: GlowVC-conditional and GlowVC-explicit. GlowVC-conditional models the distribution of mel-spectrograms with speaker-conditioned flow and disentangles the mel-spectrogram space into content- and pitch-relevant dimensions, while GlowVC-explicit models the explicit distribution with unconditioned flow and disentangles said space into content-, pitch- and speaker-relevant dimensions. We evaluate our models in terms of intelligibility, speaker similarity and naturalness for intra- and cross-lingual conversion in seen and unseen languages. GlowVC models greatly outperform AutoVC baseline in terms of intelligibility, while achieving just as high speaker similarity in intra-lingual VC, and slightly worse in the cross-lingual setting. Moreover, we demonstrate that GlowVC-explicit surpasses both GlowVC-conditional and AutoVC in terms of naturalness.

READ FULL TEXT

page 3

page 4

research
10/31/2022

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

This paper presents a method for end-to-end cross-lingual text-to-speech...
research
12/25/2018

Alpha-conversion for lambda terms with explicit weakenings

Using explicit weakenings, we can define alpha-conversion by simple equa...
research
09/30/2020

Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion

Cross-lingual voice conversion (VC) is a task that aims to synthesize ta...
research
09/15/2023

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

In this work, we introduce a framework for cross-lingual speech synthesi...
research
02/03/2021

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Cross-lingual voice conversion (VC) is an important and challenging prob...
research
10/29/2019

a novel cross-lingual voice cloning approach with a few text-free samples

In this paper, we present a cross-lingual voice cloning approach. BN fea...
research
11/23/2022

Space-efficient RLZ-to-LZ77 conversion

Consider a text T [1..n] prefixed by a reference sequence R = T [1..ℓ]. ...

Please sign up or login with your details

Forgot password? Click here to reset