Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation

04/22/2022
by   Detai Xin, et al.
0

This paper presents a speaking-rate-controllable HiFi-GAN neural vocoder. Original HiFi-GAN is a high-fidelity, computationally efficient, and tiny-footprint neural vocoder. We attempt to incorporate a speaking rate control function into HiFi-GAN for improving the accessibility of synthetic speech. The proposed method inserts a differentiable interpolation layer into the HiFi-GAN architecture. A signal resampling method and an image scaling method are implemented in the proposed method to warp the mel-spectrograms or hidden features of the neural vocoder. We also design and open-source a Japanese speech corpus containing three kinds of speaking rates to evaluate the proposed speaking rate control method. Experimental results of comprehensive objective and subjective evaluations demonstrate that 1) the proposed method outperforms a baseline time-scale modification algorithm in speech naturalness, 2) warping mel-spectrograms by image scaling obtained the best performance among all proposed methods, and 3) the proposed speaking rate control method can be incorporated into HiFi-GAN without losing computational efficiency.

READ FULL TEXT
research
10/27/2022

Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder

Our previous work, the unified source-filter GAN (uSFGAN) vocoder, intro...
research
06/10/2020

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Real-world audio recordings are often degraded by factors such as noise,...
research
09/20/2021

Interpolation variable rate image compression

Compression standards have been used to reduce the cost of image storage...
research
08/24/2020

Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

We propose a GAN-based image compression method working at extremely low...
research
02/28/2018

Multichannel Interpolation for Periodic Signals via FFT, Error Analysis and Image Scaling

This paper describes a new method for the multichannel interpolation of ...
research
03/08/2022

Practical cognitive speech compression

This paper presents a new neural speech compression method that is pract...
research
10/06/2021

Emphasis control for parallel neural TTS

The semantic information conveyed by a speech signal is strongly influen...

Please sign up or login with your details

Forgot password? Click here to reset