Voice Conversion for Whispered Speech Synthesis

12/11/2019
by   Marius Cotescu, et al.
0

We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech. We investigate using Gaussian Mixture Models (GMM) and Deep Neural Networks (DNN) to model the mapping between acoustic features of normal speech and those of whispered speech. We evaluate naturalness and speaker similarity of the converted whisper on an internal corpus and on the publicly available wTIMIT corpus. We show that applying VC techniques is significantly better than using rule-based signal processing methods and it achieves results that are indistinguishable from copy-synthesis of natural whisper recordings. We investigate the ability of the DNN model to generalize on unseen speakers, when trained with data from multiple speakers. We show that excluding the target speaker from the training set has little or no impact on the perceived naturalness and speaker similarity of the converted whisper. The proposed DNN method is used in the newly released Whisper Mode of Amazon Alexa.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

DNN-based cross-lingual voice conversion using Bottleneck Features

Cross-lingual voice conversion (CLVC) is a quite challenging task since ...
research
12/31/2019

Statistical Models in Forensic Voice Comparison

This chapter describes a number of signal-processing and statistical-mod...
research
05/07/2020

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

We propose Cotatron, a transcription-guided speech encoder for speaker-i...
research
08/07/2020

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

Multi-speaker speech synthesis is a technique for modeling multiple spea...
research
10/18/2022

Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models

In this paper, we propose a method for intermediating multiple speakers'...
research
11/24/1998

Speech Synthesis with Neural Networks

Text-to-speech conversion has traditionally been performed either by con...
research
03/28/2019

Adversarial Approximate Inference for Speech to Electroglottograph Conversion

Speech produced by human vocal apparatus conveys substantial non-semanti...

Please sign up or login with your details

Forgot password? Click here to reset