Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

11/01/2020
by   Alexander H. Liu, et al.
0

Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies while generating the representation. In this work, we propose Non-Autoregressive Predictive Coding (NPC), a self-supervised method, to learn a speech representation in a non-autoregressive manner by relying only on local dependencies of speech. NPC has a conceptually simple objective and can be implemented easily with the introduced Masked Convolution Blocks. NPC offers a significant speedup for inference since it is parallelizable in time and has a fixed inference time for each time step regardless of the input sequence length. We discuss and verify the effectiveness of NPC by theoretically and empirically comparing it with other methods. We show that the NPC representation is comparable to other methods in speech experiments on phonetic and speaker classification while being more efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2019

Generative Pre-Training for Speech with Autoregressive Predictive Coding

Learning meaningful and general representations from unannotated speech ...
research
05/17/2020

Vector-Quantized Autoregressive Predictive Coding

Autoregressive Predictive Coding (APC), as a self-supervised objective, ...
research
07/03/2023

RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations

Significant progress has been made in speaker dependent Lip-to-Speech sy...
research
04/11/2020

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Training objectives based on predictive coding have recently been shown ...
research
07/08/2020

Analysis of Predictive Coding Models for Phonemic Representation Learning in Small Datasets

Neural network models using predictive coding are interesting from the v...
research
07/13/2021

Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021

We present a system for the Zero Resource Speech Challenge 2021, which c...
research
05/25/2022

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Direct speech-to-speech translation (S2ST) systems leverage recent progr...

Please sign up or login with your details

Forgot password? Click here to reset