Unsupervised Polyglot Text To Speech

02/06/2019
by   Eliya Nachmani, et al.
0

We present a TTS neural network that is able to produce speech in multiple languages. The proposed network is able to transfer a voice, which was presented as a sample in a source language, into one of several target languages. Training is done without using matching or parallel data, i.e., without samples of the same speaker in multiple languages, making the method much more applicable. The conversion is based on learning a polyglot network that has multiple per-language sub-networks and adding loss terms that preserve the speaker's identity in multiple languages. We evaluate the proposed polyglot neural network for three languages with a total of more than 400 speakers and demonstrate convincing conversion capabilities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

We present a multispeaker, multilingual text-to-speech (TTS) synthesis m...
research
04/22/2021

Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss

Building cross-lingual voice conversion (VC) systems for multiple speake...
research
11/17/2022

Towards Building Text-To-Speech Systems for the Next Billion Users

Deep learning based text-to-speech (TTS) systems have been evolving rapi...
research
02/10/2022

Cross-speaker style transfer for text-to-speech using data augmentation

We address the problem of cross-speaker style transfer for text-to-speec...
research
02/24/2023

Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

Voice conversion (VC) techniques can be abused by malicious parties to t...
research
12/22/2017

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Inspired by recent work on neural network image generation which rely on...
research
10/06/2020

VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics

In this paper, we propose a non-parallel any-to-many voice conversion (V...

Please sign up or login with your details

Forgot password? Click here to reset