mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra

05/18/2023
by   Chenhao Shuai, et al.
0

Speech super-resolution (SSR) aims to recover a high resolution (HR) speech from its corresponding low resolution (LR) counterpart. Recent SSR methods focus more on the reconstruction of the magnitude spectrogram, ignoring the importance of phase reconstruction, thereby limiting the recovery quality. To address this issue, we propose mdctGAN, a novel SSR framework based on modified discrete cosine transform (MDCT). By adversarial learning in the MDCT domain, our method reconstructs HR speeches in a phase-aware manner without vocoders or additional post-processing. Furthermore, by learning frequency consistent features with self-attentive mechanism, mdctGAN guarantees a high quality speech reconstruction. For VCTK corpus dataset, the experiment results show that our model produces natural auditory quality with high MOS and PESQ scores. It also achieves the state-of-the-art log-spectral-distance (LSD) performance on 48 kHz target resolution from various input rates. Code is available from https://github.com/neoncloud/mdctGAN

READ FULL TEXT
research
04/06/2020

Lossless Image Compression through Super-Resolution

We introduce a simple and efficient lossless image compression algorithm...
research
12/29/2018

Brain MRI super-resolution using 3D generative adversarial networks

In this work we propose an adversarial learning approach to generate hig...
research
12/03/2019

High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram

In speech synthesis and speech enhancement systems, melspectrograms need...
research
03/28/2022

Neural Vocoder is All You Need for Speech Super-resolution

Speech super-resolution (SR) is a task to increase speech sampling rate ...
research
10/30/2016

Super-resolution estimation of cyclic arrival rates

Exploiting the fact that most arrival processes exhibit cyclic behaviour...
research
11/22/2022

AERO: Audio Super Resolution in the Spectral Domain

We present AERO, a audio super-resolution model that processes speech an...
research
09/28/2021

VoiceFixer: Toward General Speech Restoration with Neural Vocoder

Speech restoration aims to remove distortions in speech signals. Prior m...

Please sign up or login with your details

Forgot password? Click here to reset