STFT spectral loss for training a neural speech waveform model

10/29/2018
by   Shinji Takaki, et al.
0

This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly. Not only amplitude spectra but also phase spectra obtained from generated speech waveforms are used to calculate the proposed loss. We also mathematically show that training of the waveform model on the basis of the proposed loss can be interpreted as maximum likelihood training that assumes the amplitude and phase spectra of generated speech waveforms following Gaussian and von Mises distributions, respectively. Furthermore, this paper presents a simple network architecture as the speech waveform model, which is composed of uni-directional long short-term memories (LSTMs) and an auto-regressive structure. Experimental results showed that the proposed neural model synthesized high-quality speech waveforms.

READ FULL TEXT
research
03/29/2019

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform

Recently, we proposed short-time Fourier transform (STFT)-based loss fun...
research
08/17/2023

Long-frame-shift Neural Speech Phase Prediction with Spectral Continuity Enhancement and Interpolation Error Compensation

Speech phase prediction, which is a significant research focus in the fi...
research
05/31/2020

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra

Maximum Voiced Frequency (MVF) is used in various speech models as the s...
research
09/26/2022

Electron energy loss spectroscopy database synthesis and automation of core-loss edge recognition by deep-learning neural networks

The ionization edges encoded in the electron energy loss spectroscopy (E...
research
01/24/2018

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

This paper presents a waveform modeling and generation method using hier...
research
03/26/2022

A Neural Vocoder Based Packet Loss Concealment Algorithm

The packet loss problem seriously affects the quality of service in Voic...

Please sign up or login with your details

Forgot password? Click here to reset