Generating Gender-Ambiguous Text-to-Speech Voices

11/01/2022
by   Konstantinos Markopoulos, et al.
0

The gender of a voice assistant or any voice user interface is a central element of its perceived identity. While a female voice is a common choice, there is an increasing interest in alternative approaches where the gender is ambiguous rather than clearly identifying as female or male. This work addresses the task of generating gender-ambiguous text-to-speech (TTS) voices that do not correspond to any existing person. This is accomplished by sampling from a latent speaker embeddings' space that was formed while training a multilingual, multi-speaker TTS system on data from multiple male and female speakers. Various options are investigated regarding the sampling process. In our experiments, the effects of different sampling choices on the gender ambiguity and the naturalness of the resulting voices are evaluated. The proposed method is shown able to efficiently generate novel speakers that are superior to a baseline averaged speaker embedding. To our knowledge, this is the first systematic approach that can reliably generate a range of gender-ambiguous voices to meet diverse user requirements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

VoiceMe: Personalized voice generation in TTS

Novel text-to-speech systems can generate entirely new voices that were ...
research
04/22/2023

Can Voice Assistants Sound Cute? Towards a Model of Kawaii Vocalics

The Japanese notion of "kawaii" or expressions of cuteness, vulnerabilit...
research
04/10/2020

Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

We present progress towards bilingual Text-to-Speech which is able to tr...
research
06/28/2022

Show Me Your Face, And I'll Tell You How You Speak

When we speak, the prosody and content of the speech can be inferred fro...
research
10/18/2022

Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models

In this paper, we propose a method for intermediating multiple speakers'...
research
10/08/2020

Gender domain adaptation for automatic speech recognition task

This paper is focused on the finetuning of acoustic models for speaker a...
research
03/28/2019

Adversarial Approximate Inference for Speech to Electroglottograph Conversion

Speech produced by human vocal apparatus conveys substantial non-semanti...

Please sign up or login with your details

Forgot password? Click here to reset