DeepAI AI Chat
Log In Sign Up

Practical cognitive speech compression

by   Reza Lotfidereshgi, et al.

This paper presents a new neural speech compression method that is practical in the sense that it operates at low bitrate, introduces a low latency, is compatible in computational complexity with current mobile devices, and provides a subjective quality that is comparable to that of standard mobile-telephony codecs. Other recently proposed neural vocoders also have the ability to operate at low bitrate. However, they do not produce the same level of subjective quality as standard codecs. On the other hand, standard codecs rely on objective and short-term metrics such as the segmental signal-to-noise ratio that correlate only weakly with perception. Furthermore, standard codecs are less efficient than unsupervised neural networks at capturing speech attributes, especially long-term ones. The proposed method combines a cognitive-coding encoder that extracts an interpretable unsupervised hierarchical representation with a multi stage decoder that has a GAN-based architecture. We observe that this method is very robust to the quantization of representation features. An AB test was conducted on a subset of the Harvard sentences that are commonly used to evaluate standard mobile-telephony codecs. The results show that the proposed method outperforms the standard AMR-WB codec in terms of delay, bitrate and subjective quality.


page 1

page 2

page 3

page 4


DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Human subjective evaluation is the gold standard to evaluate speech qual...

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Acoustic models based on long short-term memory recurrent neural network...

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the proble...

PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech

The quality of speech coded by transform coding is affected by various a...

Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding

Low and ultra-low-bitrate neural speech coding achieves unprecedented co...

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation

This paper presents a speaking-rate-controllable HiFi-GAN neural vocoder...

Cognitive Coding of Speech

We propose an approach for cognitive coding of speech by unsupervised ex...