Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models

10/18/2022
by   Aya Watanabe, et al.
0

In this paper, we propose a method for intermediating multiple speakers' attributes and diversifying their voice characteristics in “speaker generation,” an emerging task that aims to synthesize a nonexistent speaker's naturally sounding voice. The conventional TacoSpawn-based speaker generation method represents the distributions of speaker embeddings by Gaussian mixture models (GMMs) conditioned with speaker attributes. Although this method enables the sampling of various speakers from the speaker-attribute-aware GMMs, it is not yet clear whether the learned distributions can represent speakers with an intermediate attribute (i.e., mid-attribute). To this end, we propose an optimal-transport-based method that interpolates the learned GMMs to generate nonexistent speakers with mid-attribute (e.g., gender-neutral) voices. We empirically validate our method and evaluate the naturalness of synthetic speech and the controllability of two speaker attributes: gender and language fluency. The evaluation results show that our method can control the generated speakers' attributes by a continuous scalar value without statistically significant degradation of speech naturalness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2021

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

In this paper, we present a novel modeling method for single-channel mul...
research
06/15/2021

Pathological voice adaptation with autoencoder-based voice conversion

In this paper, we propose a new approach to pathological speech synthesi...
research
12/11/2019

Voice Conversion for Whispered Speech Synthesis

We present an approach to synthesize whisper by applying a handcrafted s...
research
05/09/2022

CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech

Recently, many studies have tried to create generation models to assist ...
research
11/01/2022

Generating Gender-Ambiguous Text-to-Speech Voices

The gender of a voice assistant or any voice user interface is a central...
research
03/14/2022

Interpretable Dysarthric Speaker Adaptation based on Optimal-Transport

This work addresses the mismatch problem between the distribution of tra...
research
10/16/2018

Hierarchical Generative Modeling for Controllable Speech Synthesis

This paper proposes a neural end-to-end text-to-speech (TTS) model which...

Please sign up or login with your details

Forgot password? Click here to reset