An Unsupervised Autoregressive Model for Speech Representation Learning

04/05/2019
by   Yu-An Chung, et al.
0

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is designed to preserve information for a wide range of downstream tasks. In addition, the proposed model does not require any phonetic or word boundary labels, allowing the model to benefit from large quantities of unlabeled data. Speech representations learned by our model significantly improve performance on both phone classification and speaker verification over the surface features and other supervised and unsupervised approaches. Further analysis shows that different levels of speech information are captured by our model at different layers. In particular, the lower layers tend to be more discriminative for speakers, while the upper layers provide more phonetic content.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2019

Generative Pre-Training for Speech with Autoregressive Predictive Coding

Learning meaningful and general representations from unannotated speech ...
research
10/12/2021

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

The speech representations learned from large-scale unlabeled data have ...
research
10/25/2019

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

We present Mockingjay as a new speech representation learning approach, ...
research
07/25/2020

Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

This paper extends recent work on nonlinear Independent Component Analys...
research
08/05/2021

Applying the Information Bottleneck Principle to Prosodic Representation Learning

This paper describes a novel design of a neural network-based speech gen...
research
08/08/2020

Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification

In this paper, we propose a novel way of addressing text-dependent autom...
research
09/17/2022

Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions

This paper considers a representation learning strategy to model speech ...

Please sign up or login with your details

Forgot password? Click here to reset