DeepAI AI Chat
Log In Sign Up

Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder

by   Cristina Garbacea, et al.

In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.


page 1

page 2

page 3

page 4


Wavenet based low rate speech coding

Traditional parametric coding of speech facilitates low rate but provide...

Speech bandwidth extension with WaveNet

Large-scale mobile communication systems tend to contain legacy transmis...

Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Traditional low bit-rate speech coding approach only handles narrowband ...

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the proble...

A novel method of speech information hiding based on 3D-Magic Matrix

Redundant information of low-bit-rate speech is extremely small, thus it...

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate

Recently, GAN vocoders have seen rapid progress in speech synthesis, sta...

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful repres...