DeepAI AI Chat
Log In Sign Up

Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder

10/14/2019
by   Cristina Garbacea, et al.
11

In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/01/2017

Wavenet based low rate speech coding

Traditional parametric coding of speech facilitates low rate but provide...
07/05/2019

Speech bandwidth extension with WaveNet

Large-scale mobile communication systems tend to contain legacy transmis...
02/04/2021

Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Traditional low bit-rate speech coding approach only handles narrowband ...
07/07/2022

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the proble...
09/09/2018

A novel method of speech information hiding based on 3D-Magic Matrix

Redundant information of low-bit-rate speech is extremely small, thus it...
08/09/2021

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate

Recently, GAN vocoders have seen rapid progress in speech synthesis, sta...
05/16/2020

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful repres...