Adversarial Approximate Inference for Speech to Electroglottograph Conversion

03/28/2019
by   Prathosh A. P., et al.
0

Speech produced by human vocal apparatus conveys substantial non-semantic information including the gender of the speaker, voice quality, affective state, abnormalities in the vocal apparatus etc. Such information is attributed to the properties of the voice source signal, which is usually estimated from the speech signal. However, most of the source estimation techniques depend heavily on the goodness of the model assumptions and are prone to noise. A popular alternative is to indirectly obtain the source information through the Electroglottographic (EGG) signal that measures the electrical admittance around the vocal folds using a dedicated hardware. In this paper, we address the problem of estimating the EGG signal directly from the speech signal, devoid of any hardware. Sampling from the intractable conditional distribution of the EGG signal given the speech signal is accomplished through optimization of an evidence lower bound. This is constructed via minimization of the KL-divergence between the true and the approximated posteriors of a latent variable learned using a deep neural auto-encoder that serves an informative prior which reconstructs the EGG signal. We demonstrate the efficacy of the method to generate EGG signal by conducting several experiments on datasets comprising multiple speakers, voice qualities, noise settings and speech pathologies. The proposed method is evaluated on many benchmark metrics and is found to agree with the gold standards while being better than the state-of-the-art algorithms on a few tasks such as epoch extraction.

READ FULL TEXT

page 1

page 9

research
02/19/2018

Voice Impersonation using Generative Adversarial Networks

Voice impersonation is not the same as voice transformation, although th...
research
06/07/2020

VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture

Voice conversion (VC) is a task that transforms the source speaker's tim...
research
01/02/2020

On the Mutual Information between Source and Filter Contributions for Voice Pathology Detection

This paper addresses the problem of automatic detection of voice patholo...
research
07/02/2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Building a voice conversion system for noisy target speakers, such as us...
research
12/11/2019

Voice Conversion for Whispered Speech Synthesis

We present an approach to synthesize whisper by applying a handcrafted s...
research
11/01/2022

Generating Gender-Ambiguous Text-to-Speech Voices

The gender of a voice assistant or any voice user interface is a central...

Please sign up or login with your details

Forgot password? Click here to reset