Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model

11/10/2020
by   Haoyu Li, et al.
0

High-quality speech corpora are essential foundations for most speech applications. However, such speech data are expensive and limited since they are collected in professional recording environments. In this work, we propose an encoder-decoder neural network to automatically enhance low-quality recordings to professional high-quality recordings. To address channel variability, we first filter out the channel characteristics from the original input audio using the encoder network with adversarial training. Next, we disentangle the channel factor from a reference audio. Conditioned on this factor, an auto-regressive decoder is then used to predict the target-environment Mel spectrogram. Finally, we apply a neural vocoder to synthesize the speech waveform. Experimental results show that the proposed system can generate a professional high-quality speech waveform when setting high-quality audio as the reference. It also improves speech enhancement performance compared with several state-of-the-art baseline systems.

READ FULL TEXT
research
09/16/2021

DDS: A new device-degraded speech dataset for speech enhancement

A large and growing amount of speech content in real-life scenarios is b...
research
11/10/2019

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

Nowadays vast amounts of speech data are recorded from low-quality recor...
research
05/03/2023

Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research

Modelling of early language acquisition aims to understand how infants b...
research
09/19/2019

WEnets: A Convolutional Framework for Evaluating Audio Waveforms

We describe a new convolutional framework for waveform evaluation, WEnet...
research
04/29/2023

Adversarial Representation Learning for Robust Privacy Preservation in Audio

Sound event detection systems are widely used in various applications su...
research
06/27/2022

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional...
research
09/14/2023

SpatialCodec: Neural Spatial Speech Coding

In this work, we address the challenge of encoding speech captured by a ...

Please sign up or login with your details

Forgot password? Click here to reset