Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform

03/29/2019
by   Shinji Takaki, et al.
0

Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model. In this paper, we generalize the above framework and propose a training scheme for such models based on spectral amplitude and phase losses obtained by either STFT or continuous wavelet transform (CWT), or both of them. Since CWT is capable of having time and frequency resolutions different from those of STFT and is cable of considering those closer to human auditory scales, the proposed loss functions could provide complementary information on speech signals. Experimental results showed that it is possible to train a high-quality model by using the proposed CWT spectral loss and is as good as one using STFT-based loss.

READ FULL TEXT

page 2

page 4

research
10/29/2018

STFT spectral loss for training a neural speech waveform model

This paper proposes a new loss using short-time Fourier transform (STFT)...
research
09/02/2012

Short-time homomorphic wavelet estimation

Successful wavelet estimation is an essential step for seismic methods l...
research
03/08/2019

A Deep Generative Model of Speech Complex Spectrograms

This paper proposes an approach to the joint modeling of the short-time ...
research
02/12/2019

Joint Training of Neural Network Ensembles

We examine the practice of joint training for neural network ensembles, ...
research
05/22/2023

Towards generalizing deep-audio fake detection networks

Today's generative neural networks allow the creation of high-quality sy...
research
03/21/2022

Can we integrate spatial verification methods into neural-network loss functions for atmospheric science?

In the last decade, much work in atmospheric science has focused on spat...
research
06/02/2023

Auditory Representation Effective for Estimating Vocal Tract Information

We can estimate the size of the speaker solely based on their speech sou...

Please sign up or login with your details

Forgot password? Click here to reset