Scalable and Efficient Neural Speech Coding

03/27/2021
by   Kai Zhen, et al.
0

This work presents a scalable and efficient neural waveform codec (NWC) for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as its feedforward routine. The proposed CNN autoencoder also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process. We achieve efficiency by introducing compact model architectures to our fully convolutional network model, such as gated residual networks and depthwise separable convolution. Furthermore, the proposed models are with a scalable architecture, cross-module residual learning (CMRL), to cover a wide range of bitrates. To this end, we employ the residual coding concept to concatenate multiple NWC autoencoding modules, where an NWC module performs residual coding to restore any reconstruction loss that its preceding modules have created. CMRL can scale down to cover lower bitrates as well, for which it employs linear predictive coding (LPC) module as its first autoencoder. We redefine LPC's quantization as a trainable module to enhance the bit allocation tradeoff between LPC and its following NWC modules. Compared to the other autoregressive decoder-based neural speech coders, our decoder has significantly smaller architecture, e.g., with only 0.12 million parameters, more than 100 times smaller than a WaveNet decoder. Compared to the LPCNet-based speech codec, which leverages the speech production model to reduce the network complexity in low bitrates, ours can scale up to higher bitrates to achieve transparent performance. Our lightweight neural speech coding model achieves comparable subjective scores against AMR-WB at the low bitrate range and provides transparent coding quality at 32 kbps.

READ FULL TEXT

page 1

page 10

research
06/18/2019

Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding

Speech codecs learn compact representations of speech signals to facilit...
research
11/04/2022

Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding

Low and ultra-low-bitrate neural speech coding achieves unprecedented co...
research
02/13/2020

Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization

Scalability and efficiency are desired in neural speech codecs, which su...
research
07/01/2020

End-to-End JPEG Decoding and Artifacts Suppression Using Heterogeneous Residual Convolutional Neural Network

Existing deep learning models separate JPEG artifacts suppression from t...
research
07/25/2023

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

Recently, speech codecs based on neural networks have proven to perform ...
research
10/25/2017

End-to-End Optimized Speech Coding with Deep Neural Networks

Modern compression algorithms are often the result of laborious domain-s...
research
01/12/2020

Deep Optimized Multiple Description Image Coding via Scalar Quantization Learning

In this paper, we introduce a deep multiple description coding (MDC) fra...

Please sign up or login with your details

Forgot password? Click here to reset