Deep Neural Networks and End-to-End Learning for Audio Compression

05/25/2021
by   Daniela N. Rim, et al.
0

Recent achievements in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data with unified deep network models. Having such models for compressing audio signals has been challenging since it requires discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach allows the separation of the encoder and decoder, which is necessary for compression tasks. To our best knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.54.

READ FULL TEXT

page 6

page 7

research
11/07/2018

Neural Image Compression for Gigapixel Histopathology Image Analysis

We present Neural Image Compression (NIC), a method to reduce the size o...
research
01/27/2022

Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG Encoder-Decoder

Recent advances in deep learning have led to superhuman performance acro...
research
02/10/2020

End-to-End Facial Deep Learning Feature Compression with Teacher-Student Enhancement

In this paper, we propose a novel end-to-end feature compression scheme ...
research
04/05/2017

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

End-to-end training of deep learning-based models allows for implicit le...
research
07/05/2019

Deep Neural Baselines for Computational Paralinguistics

Detecting sleepiness from spoken language is an ambitious task, which is...
research
05/31/2023

DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer

Despite the huge successes made in neutral TTS, content-leakage remains ...
research
08/13/2021

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-devices

Deep Learning has celebrated resounding successes in many application ar...

Please sign up or login with your details

Forgot password? Click here to reset