DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

10/13/2021
by   Sergey Nikonorov, et al.
0

Conventional vocoders are commonly used as analysis tools to provide interpretable features for downstream tasks such as speech synthesis and voice conversion. They are built under certain assumptions about the signals following signal processing principle, therefore, not easily generalizable to different audio, for example, from speech to singing. In this paper, we propose a deep neural analyzer, denoted as DeepA - a neural vocoder that extracts F0 and timbre/aperiodicity encoding from the input speech that emulate those defined in conventional vocoders. Therefore, the resulting parameters are more interpretable than other latent neural representations. At the same time, as the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing. The proposed neural analyzer is built based on a variational autoencoder (VAE) architecture. We show that DeepA improves F0 estimation over the conventional vocoder (WORLD). To our best knowledge, this is the first study dedicated to the development of a neural framework for extracting learnable vocoder-like parameters.

READ FULL TEXT

page 3

page 6

research
02/12/2021

Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier

Recently, variational autoencoders have been successfully used to learn ...
research
01/24/2022

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Recently, variational autoencoder (VAE), a deep representation learning ...
research
04/06/2023

DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection

Tools to generate high quality synthetic speech signal that is perceptua...
research
05/02/2019

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

In this work, we investigate the effectiveness of two techniques for imp...
research
03/12/2021

Learning spectro-temporal representations of complex sounds with parameterized neural networks

Deep Learning models have become potential candidates for auditory neuro...
research
05/11/2022

A deep representation learning speech enhancement method using β-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Baye...
research
03/30/2022

Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

As a neurophysiological response to threat or adverse conditions, stress...

Please sign up or login with your details

Forgot password? Click here to reset