DeepAI AI Chat
Log In Sign Up

Cognitive Coding of Speech

by   Reza Lotfidereshgi, et al.

We propose an approach for cognitive coding of speech by unsupervised extraction of contextual representations in two hierarchical levels of abstraction. Speech attributes such as phoneme identity that last one hundred milliseconds or less are captured in the lower level of abstraction, while speech attributes such as speaker identity and emotion that persist up to one second are captured in the higher level of abstraction. This decomposition is achieved by a two-stage neural network, with a lower and an upper stage operating at different time scales. Both stages are trained to predict the content of the signal in their respective latent spaces. A top-down pathway between stages further improves the predictive capability of the network. With an application in speech compression in mind, we investigate the effect of dimensionality reduction and low bitrate quantization on the extracted representations. The performance measured on the LibriSpeech and EmoV-DB datasets reaches, and for some speech attributes even exceeds, that of state-of-the-art approaches.


page 1

page 2

page 3

page 4


Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

Speaker extraction uses a pre-recorded reference speech as the reference...

A Novel Chaotic Uniform Quantizer for Speech Coding

Quantization is an essential step in the analog-to-digital conversion pr...

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Human speech can be characterized by different components, including sem...

Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

In this work, we study the hypothesis that speaker identity embeddings e...

SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

Transformer has obtained promising results on cognitive speech signal pr...

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

In ICASSP 2023 speech signal improvement challenge, we developed a dual-...

Perceptual Context in Cognitive Hierarchies

Cognition does not only depend on bottom-up sensor feature abstraction, ...