Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

04/10/2020
by   Soumi Maiti, et al.
0

We present progress towards bilingual Text-to-Speech which is able to transform a monolingual voice to speak a second language while preserving speaker voice quality. We demonstrate that a bilingual speaker embedding space contains a separate distribution for each language and that a simple transform in speaker space generated by the speaker embedding can be used to control the degree of accent of a synthetic voice in a language. The same transform can be applied even to monolingual speakers. In our experiments speaker data from an English-Spanish (Mexican) bilingual speaker was used, and the goal was to enable English speakers to speak Spanish and Spanish speakers to speak English. We found that the simple transform was sufficient to convert a voice from one language to the other with a high degree of naturalness. In one case the transformed voice outperformed a native language voice in listening tests. Experiments further indicated that the transform preserved many of the characteristics of the original voice. The degree of accent present can be controlled and naturalness is relatively consistent across a range of accent values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2020

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

Recent state-of-the-art neural text-to-speech (TTS) synthesis models hav...
research
11/23/2022

Voice-preserving Zero-shot Multiple Accent Conversion

Most people who have tried to learn a foreign language would have experi...
research
12/07/2022

Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

In most cases, bilingual TTS needs to handle three types of input script...
research
11/01/2022

Generating Gender-Ambiguous Text-to-Speech Voices

The gender of a voice assistant or any voice user interface is a central...
research
09/26/2022

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings

In forensic voice comparison the speaker embedding has become widely pop...
research
04/24/2018

Perceptual Evaluation of the Effectiveness of Voice Disguise by Age Modification

Voice disguise, purposeful modification of one's speaker identity with t...
research
08/27/2020

Estimating Uniqueness of Human Voice UsingI-Vector Representation

We study the individuality of human voice with re-spect to a widely used...

Please sign up or login with your details

Forgot password? Click here to reset