Speech bandwidth extension with WaveNet

07/05/2019
by   Archit Gupta, et al.
0

Large-scale mobile communication systems tend to contain legacy transmission channels with narrowband bottlenecks, resulting in characteristic "telephone-quality" audio. While higher quality codecs exist, due to the scale and heterogeneity of the networks, transmitting higher sample rate audio with modern high-quality audio codecs can be difficult in practice. This paper proposes an approach where a communication node can instead extend the bandwidth of a band-limited incoming speech signal that may have been passed through a low-rate codec. To this end, we propose a WaveNet-based model conditioned on a log-mel spectrogram representation of a bandwidth-constrained speech audio signal of 8 kHz and audio with artifacts from GSM full-rate (FR) compression to reconstruct the higher-resolution signal. In our experimental MUSHRA evaluation, we show that a model trained to upsample to 24kHz speech signals from audio passed through the 8kHz GSM-FR codec is able to reconstruct audio only slightly lower in quality to that of the Adaptive Multi-Rate Wideband audio codec (AMR-WB) codec at 16kHz, and closes around half the gap in perceptual quality between the original encoded signal and the original speech sampled at 24kHz. We further show that when the same model is passed 8kHz audio that has not been compressed, is able to again reconstruct audio of slightly better quality than 16kHz AMR-WB, in the same MUSHRA evaluation.

READ FULL TEXT
research
09/13/2023

AudioSR: Versatile Audio Super-resolution at Scale

Audio super-resolution is a fundamental task that predicts high-frequenc...
research
10/14/2019

Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder

In order to efficiently transmit and store speech signals, speech codecs...
research
03/30/2022

Forensic Analysis and Localization of Multiply Compressed MP3 Audio Using Transformers

Audio signals are often stored and transmitted in compressed formats. Am...
research
10/25/2022

EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient microphones

In this paper, we present Extreme Bandwidth Extension Network (EBEN), a ...
research
08/02/2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Deep generative models can generate high-fidelity audio conditioned on v...
research
03/14/2023

Native Multi-Band Audio Coding within Hyper-Autoencoded Reconstruction Propagation Networks

Spectral sub-bands do not portray the same perceptual relevance. In audi...
research
03/17/2023

Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture

This paper presents a configurable version of Extreme Bandwidth Extensio...

Please sign up or login with your details

Forgot password? Click here to reset