Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

04/15/2016
by   Milos Cernak, et al.
0

Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition/synthesis techniques. This allows transmission of information (such as phonemes) segment by segment that decreases the bit rate. However, the encoder based on a phoneme speech recognition may create bursts of segmental errors. Segmental errors are further propagated to optional suprasegmental (such as syllable) information coding. Together with the errors of voicing detection in pitch parametrization, HMM-based speech coding creates speech discontinuities and unnatural speech sound artefacts. In this paper, we propose a novel VLBR speech coding framework based on neural networks (NNs) for end-to-end speech analysis and synthesis without HMMs. The speech coding framework relies on phonological (sub-phonetic) representation of speech, and it is designed as a composition of deep and spiking NNs: a bank of phonological analysers at the transmitter, and a phonological synthesizer at the receiver, both realised as deep NNs, and a spiking NN as an incremental and robust encoder of syllable boundaries for coding of continuous fundamental frequency (F0). A combination of phonological features defines much more sound patterns than phonetic features defined by HMM-based speech coders, and the finer analysis/synthesis code contributes into smoother encoded speech. Listeners significantly prefer the NN-based approach due to fewer discontinuities and speech artefacts of the encoded speech. A single forward pass is required during the speech encoding and decoding. The proposed VLBR speech coding operates at a bit rate of approximately 360 bits/s.

READ FULL TEXT
research
07/07/2022

NESC: Robust Neural End-2-End Speech Coding with GANs

Neural networks have proven to be a formidable tool to tackle the proble...
research
05/12/2019

Deep Vocoder: Low Bit Rate Speech Compression of Speech with Deep Autoencoder

Inspired by the success of deep neural networks (DNNs) in speech process...
research
05/12/2019

Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder

Inspired by the success of deep neural networks (DNNs) in speech process...
research
05/09/2022

Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis

In this paper, we develop a deep learning based semantic communication s...
research
08/09/2021

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate

Recently, GAN vocoders have seen rapid progress in speech synthesis, sta...
research
08/27/2022

Minimal Feature Analysis for Isolated Digit Recognition for varying encoding rates in noisy environments

This research work is about recent development made in speech recognitio...
research
09/12/2019

Neural Population Coding for Effective Temporal Classification

Neural encoding plays an important role in faithfully describing the tem...

Please sign up or login with your details

Forgot password? Click here to reset