A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

06/03/2020
by   Sameer Khurana, et al.
0

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we propose Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear phone classification and recognition on the Wall Street Journal dataset. Furthermore, we found that ConvDMM complements self-supervised methods like Wav2Vec and PASE, improving on the results achieved with any of the methods alone. Lastly, we find that ConvDMM features enable learning better phone recognizers than any other features in an extreme low-resource regime with few labeled training examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

A Brief Overview of Unsupervised Neural Speech Representation Learning

Unsupervised representation learning for speech processing has matured g...
research
01/25/2019

Revisiting Self-Supervised Visual Representation Learning

Unsupervised visual representation learning remains a largely unsolved p...
research
05/05/2023

A vector quantized masked autoencoder for audiovisual speech emotion recognition

While fully-supervised models have been shown to be effective for audiov...
research
09/05/2023

Non-Parametric Representation Learning with Kernels

Unsupervised and self-supervised representation learning has become popu...
research
06/18/2020

Self-supervised Learning for Speech Enhancement

Supervised learning for single-channel speech enhancement requires caref...
research
12/14/2020

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

We investigate segmenting and clustering speech into low-bitrate phone-l...
research
07/28/2021

Self-Supervised Hybrid Inference in State-Space Models

We perform approximate inference in state-space models that allow for no...

Please sign up or login with your details

Forgot password? Click here to reset