On Using Backpropagation for Speech Texture Generation and Voice Conversion

12/22/2017
by   Jan Chorowski, et al.
0

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances. Similar to image texture synthesis and neural style transfer, the system works by optimizing a cost function with respect to the input waveform samples. To this end we use a differentiable mel-filterbank feature extraction pipeline and train a convolutional CTC speech recognition network. Our system is able to extract speaker characteristics from very limited amounts of target speaker data, as little as a few seconds, and can be used to generate realistic speech babble or reconstruct an utterance in a different voice.

READ FULL TEXT

page 2

page 3

page 4

research
02/19/2018

Voice Impersonation using Generative Adversarial Networks

Voice impersonation is not the same as voice transformation, although th...
research
10/08/2019

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Traditional voice conversion methods rely on parallel recordings of mult...
research
12/20/2019

Learning Singing From Speech

We propose an algorithm that is capable of synthesizing high quality tar...
research
05/22/2020

NAUTILUS: a Versatile Voice Cloning System

We introduce a novel speech synthesis system, called NAUTILUS, that can ...
research
05/15/2020

ConVoice: Real-Time Zero-Shot Voice Style Transfer with Convolutional Network

We propose a neural network for zero-shot voice conversion (VC) without ...
research
10/06/2020

VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics

In this paper, we propose a non-parallel any-to-many voice conversion (V...
research
02/06/2019

Unsupervised Polyglot Text To Speech

We present a TTS neural network that is able to produce speech in multip...

Please sign up or login with your details

Forgot password? Click here to reset