DGC-vector: A new speaker embedding for zero-shot voice conversion

03/18/2022
by   Ruitong Xiao, et al.
0

Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. To better represent the characteristics of the target speaker and improve the speaker similarity in zero-shot voice conversion, we propose a novel speaker representation method in this paper. Our method combines the advantages of D-vector, global style token (GST) based speaker representation and auxiliary supervision. Objective and subjective evaluations show that the proposed method achieves a decent performance on zero-shot voice conversion and significantly improves speaker similarity over D-vector and GST-based speaker embedding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2021

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

YourTTS brings the power of a multilingual approach to the task of zero-...
research
11/06/2021

SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines

Nowadays, as more and more systems achieve good performance in tradition...
research
03/30/2022

Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion

Traditional studies on voice conversion (VC) have made progress with par...
research
09/18/2023

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

This paper presents a novel task, zero-shot voice conversion based on fa...
research
02/16/2023

ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations

In this work, we propose a zero-shot voice conversion method using speec...
research
10/24/2020

GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus

Non-parallel many-to-many voice conversion is recently attract-ing huge ...
research
09/09/2022

DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

The widespread adoption of speech-based online services raises security ...

Please sign up or login with your details

Forgot password? Click here to reset