Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

04/16/2020
by   Yang Ai, et al.
0

In our previous work, we have proposed a neural vocoder called HiNet which recovers speech waveforms by predicting amplitude and phase spectra hierarchically from input acoustic features. In HiNet, the amplitude spectrum predictor (ASP) predicts log amplitude spectra (LAS) from input acoustic features. This paper proposes a novel knowledge-and-data-driven ASP (KDD-ASP) to improve the conventional one. First, acoustic features (i.e., F0 and mel-cepstra) pass through a knowledge-driven LAS recovery module to obtain approximate LAS (ALAS). This module is designed based on the combination of STFT and source-filter theory, in which the source part and the filter part are designed based on input F0 and mel-cepstra, respectively. Then, the recovered ALAS are processed by a data-driven LAS refinement module which consists of multiple trainable convolutional layers to get the final LAS. Experimental results show that the HiNet vocoder using KDD-ASP can achieve higher quality of synthetic speech than that using conventional ASP and the WaveRNN vocoder on a text-to-speech (TTS) task.

READ FULL TEXT

page 3

page 4

research
06/23/2019

A Neural Vocoder with Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis

This paper presents a neural vocoder named HiNet which reconstructs spee...
research
03/18/2019

CRAFT: A multifunction online platform for speech prosody visualisation

There are many research tools which are also used for teaching the acous...
research
11/08/2020

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation

This paper presents a denoising and dereverberation hierarchical neural ...
research
05/13/2023

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

This paper presents a novel neural vocoder named APNet which reconstruct...
research
06/29/2021

FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

Methods for modeling and controlling prosody with acoustic features have...
research
11/29/2022

Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses

This paper presents a novel speech phase prediction model which predicts...
research
07/05/2022

Regularized Predictive Models for Beef Eating Quality of Individual Meals

Faced with changing markets and evolving consumer demands, beef industri...

Please sign up or login with your details

Forgot password? Click here to reset