Autoregressive Co-Training for Learning Discrete Speech Representations

03/29/2022
by   Sung-Lin Yeh, et al.
0

While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other. In this paper, we consider a generative model with discrete latent variables that learns a discrete representation for speech. The objective of learning the generative model is formulated as information-theoretic co-training. Besides the wide generality, the objective can be optimized with several approaches, subsuming HuBERT-like training and vector quantization for learning discrete representation. Empirically, we find that the proposed approach learns discrete representation that is highly correlated with phonetic units, more correlated than HuBERT-like training and vector quantization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2023

An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech

Self-supervised representation learning for speech often involves a quan...
research
10/08/2022

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Speech is the surface form of a finite set of phonetic units, which can ...
research
08/31/2023

RepCodec: A Speech Representation Codec for Speech Tokenization

With recent rapid growth of large language models (LLMs), discrete speec...
research
03/03/2020

VQ-DRAW: A Sequential Discrete VAE

In this paper, I present VQ-DRAW, an algorithm for learning compact disc...
research
04/11/2020

Depthwise Discrete Representation Learning

Recent advancements in learning Discrete Representations as opposed to c...
research
03/11/2023

Regularized Vector Quantization for Tokenized Image Synthesis

Quantizing images into discrete representations has been a fundamental p...
research
06/25/2021

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

Generally speaking, the main objective when training a neural speech syn...

Please sign up or login with your details

Forgot password? Click here to reset