Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

07/10/2018
by   Shinnosuke Takamichi, et al.
0

This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic speech. Addressing this problem, we introduce the von-Mises-distribution DNN for phase reconstruction. The DNN is a generative model having the von Mises distribution that can model distributions of a periodic variable such as a phase, and the model parameters of the DNN are estimated on the basis of the maximum likelihood criterion. Furthermore, we propose a group-delay loss for DNN training to make the predicted group delay close to a natural group delay. The experimental results demonstrate that 1) the trained DNN can predict group delay accurately more than phases themselves, and 2) our phase reconstruction methods achieve better speech quality than the conventional Griffin-Lim method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2020

Phase reconstruction based on recurrent phase unwrapping with deep neural networks

Phase reconstruction, which estimates phase from a given amplitude spect...
research
03/10/2019

Deep Griffin-Lim Iteration

This paper presents a novel phase reconstruction method (only from a giv...
research
03/20/2023

Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data Interpretation

Manual analysis of XRD data is usually laborious and time consuming. The...
research
11/06/2017

Minimum-Phase HRTF Modeling of Pinna Spectral Notches using Group Delay Decomposition

Accurate reconstruction of HRTFs is important in the design and developm...
research
11/01/2017

Complex-valued image denosing based on group-wise complex-domain sparsity

Phase imaging and wavefront reconstruction from noisy observations of co...
research
04/12/2017

Sampling-based speech parameter generation using moment-matching networks

This paper presents sampling-based speech parameter generation using mom...
research
11/29/2022

Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses

This paper presents a novel speech phase prediction model which predicts...

Please sign up or login with your details

Forgot password? Click here to reset