DeepAI AI Chat
Log In Sign Up

End-to-End Optimized Speech Coding with Deep Neural Networks

by   Srihari Kankanahalli, et al.

Modern compression algorithms are often the result of laborious domain-specific research; industry standards such as MP3, JPEG, and AMR-WB took years to develop and were largely hand-designed. We present a deep neural network model which optimizes all the steps of a wideband speech coding pipeline (compression, quantization, entropy coding, and decompression) end-to-end directly from raw speech data -- no manual feature engineering necessary, and it trains in hours. In testing, our DNN-based coder performs on par with the AMR-WB standard at a variety of bitrates ( 9kbps up to 24kbps). It also runs in realtime on a 3.8GhZ Intel CPU.


page 1

page 2

page 3

page 4


Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding

Speech codecs learn compact representations of speech signals to facilit...

DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks

The field of video compression has developed some of the most sophistica...

End-to-end Learning of Compressible Features

Pre-trained convolutional neural networks (CNNs) are powerful off-the-sh...

DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess

We present an end-to-end learning method for chess, relying on deep neur...

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the proble...

Scalable and Efficient Neural Speech Coding

This work presents a scalable and efficient neural waveform codec (NWC) ...

A comparative study between linear and nonlinear speech prediction

This paper is focused on nonlinear prediction coding, which consists on ...