A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System

07/13/2022
by   Yi-Chiao Wu, et al.
0

Neural-based text-to-speech (TTS) systems achieve very high-fidelity speech generation because of the rapid neural network developments. However, the huge labeled corpus and high computation cost requirements limit the possibility of developing a high-fidelity TTS system by small companies or individuals. On the other hand, a neural vocoder, which has been widely adopted for the speech generation in neural-based TTS systems, can be trained with a relatively small unlabeled corpus. Therefore, in this paper, we explore a general framework to develop a neural post-filter (NPF) for low-cost TTS systems using neural vocoders. A cyclical approach is proposed to tackle the acoustic and temporal mismatches (AM and TM) of developing an NPF. Both objective and subjective evaluations have been conducted to demonstrate the AM and TM problems and the effectiveness of the proposed framework.

READ FULL TEXT

page 1

page 6

research
05/18/2020

A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Recently, the effectiveness of text-to-speech (TTS) systems combined wit...
research
04/12/2022

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

Speech restoration aims to remove distortions in speech signals. Prior m...
research
08/11/2020

Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

LPCNet is an efficient vocoder that combines linear prediction and deep ...
research
11/03/2020

StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

In recent years, neural vocoders have surpassed classical speech generat...
research
06/21/2021

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

Current two-stage TTS framework typically integrates an acoustic model w...
research
05/18/2023

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

This paper presents FastFit, a novel neural vocoder architecture that re...
research
04/29/2020

Data-Assisted Model-Based Anomaly Detection for High-Fidelity Simulators of Power Systems

The main objective of this article is to develop scalable anomaly detect...

Please sign up or login with your details

Forgot password? Click here to reset