A Two-Stage Training Framework for Joint Speech Compression and Enhancement

09/08/2023
by   Jiayi Huang, et al.
0

This paper considers the joint compression and enhancement problem for speech signal in the presence of noise. Recently, the SoundStream codec, which relies on end-to-end joint training of an encoder-decoder pair and a residual vector quantizer by a combination of adversarial and reconstruction losses,has shown very promising performance, especially in subjective perception quality. In this work, we provide a theoretical result to show that, to simultaneously achieve low distortion and high perception in the presence of noise, there exist an optimal two-stage optimization procedure for the joint compression and enhancement problem. This procedure firstly optimizes an encoder-decoder pair using only distortion loss and then fixes the encoder to optimize a perceptual decoder using perception loss. Based on this result, we construct a two-stage training framework for joint compression and enhancement of noisy speech signal. Unlike existing training methods which are heuristic, the proposed two-stage training method has a theoretical foundation. Finally, experimental results for various noise and bit-rate conditions are provided. The results demonstrate that a codec trained by the proposed framework can outperform SoundStream and other representative codecs in terms of both objective and subjective evaluation metrics. Code is available at https://github.com/jscscloris/SEStream.

READ FULL TEXT

page 1

page 8

page 9

research
07/07/2021

SoundStream: An End-to-End Neural Audio Codec

We present SoundStream, a novel neural audio codec that can efficiently ...
research
06/05/2021

On Perceptual Lossy Compression: The Cost of Perceptual Reconstruction and An Optimal Training Framework

Lossy compression algorithms are typically designed to achieve the lowes...
research
06/02/2023

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

This paper introduces an end-to-end neural speech restoration model, HD-...
research
03/08/2022

Practical cognitive speech compression

This paper presents a new neural speech compression method that is pract...
research
06/21/2022

Optimally Controllable Perceptual Lossy Compression

Recent studies in lossy compression show that distortion and perceptual ...
research
11/02/2022

Verified Reversible Programming for Verified Lossless Compression

Lossless compression implementations typically contain two programs, an ...
research
08/18/2021

MBRS : Enhancing Robustness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression

Based on the powerful feature extraction ability of deep learning archit...

Please sign up or login with your details

Forgot password? Click here to reset