A Deep Generative Model of Speech Complex Spectrograms

03/08/2019
by   Aditya Arie Nugraha, et al.
0

This paper proposes an approach to the joint modeling of the short-time Fourier transform magnitude and phase spectrograms with a deep generative model. We assume that the magnitude follows a Gaussian distribution and the phase follows a von Mises distribution. To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i.e., the group delay and the instantaneous frequency. Based on these assumptions, we explore and compare several combinations of loss functions for training our models. Built upon the variational autoencoder framework, our model consists of three convolutional neural networks acting as an encoder, a magnitude decoder, and a phase decoder. In addition to the latent variables, we propose to also condition the phase estimation on the estimated magnitude. Evaluated for a time-domain speech reconstruction task, our models could generate speech with a high perceptual quality and a high intelligibility.

READ FULL TEXT
research
08/17/2023

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Phase information has a significant impact on speech perceptual quality ...
research
11/12/2022

Online Phase Reconstruction via DNN-based Phase Differences Estimation

This paper presents a two-stage online phase reconstruction framework us...
research
10/25/2019

Learning audio representations via phase prediction

We learn audio representations by solving a novel self-supervised learni...
research
03/29/2019

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform

Recently, we proposed short-time Fourier transform (STFT)-based loss fun...
research
08/25/2021

Temporal envelope and fine structure cues for dysarthric speech detection using CNNs

Deep learning-based techniques for automatic dysarthric speech detection...
research
08/11/2021

On The Compensation Between Magnitude and Phase in Speech Separation

Deep neural network (DNN) based end-to-end optimization in the complex t...
research
06/06/2023

Phase perturbation improves channel robustness for speech spoofing countermeasures

In this paper, we aim to address the problem of channel robustness in sp...

Please sign up or login with your details

Forgot password? Click here to reset