Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding

06/18/2019
by   Kai Zhen, et al.
0

Speech codecs learn compact representations of speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity. We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules. CMRL differs from other DNN-based speech codecs, in that rather than modeling speech compression problem in a single large neural network, it optimizes a series of less-complicated modules in a two-phase training scheme. The proposed method shows better objective performance than AMR-WB and the state-of-the-art DNN-based speech codec with a similar network architecture. As an end-to-end model, it takes raw PCM signals as an input, but is also compatible with linear predictive coding (LPC), showing better subjective quality at high bitrates than AMR-WB and OPUS. The gain is achieved by using only 0.9 million trainable parameters, a significantly less complex architecture than the other DNN-based codecs in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2021

Scalable and Efficient Neural Speech Coding

This work presents a scalable and efficient neural waveform codec (NWC) ...
research
10/25/2017

End-to-End Optimized Speech Coding with Deep Neural Networks

Modern compression algorithms are often the result of laborious domain-s...
research
12/25/2021

Neural Network Module Decomposition and Recomposition

We propose a modularization method that decomposes a deep neural network...
research
03/03/2022

Deep Learning-Based Joint Control of Acoustic Echo Cancellation, Beamforming and Postfiltering

We introduce a novel method for controlling the functionality of a hands...
research
05/12/2021

SauvolaNet: Learning Adaptive Sauvola Network for Degraded Document Binarization

Inspired by the classic Sauvola local image thresholding approach, we sy...
research
11/04/2022

Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding

Low and ultra-low-bitrate neural speech coding achieves unprecedented co...
research
03/07/2019

Learning deep neural networks in blind deblurring framework

Recently, end-to-end learning methods based on deep neural network (DNN)...

Please sign up or login with your details

Forgot password? Click here to reset