Unsupervised speech representation learning using WaveNet autoencoders

01/25/2019
by   Jan Chorowski, et al.
12

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. The behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

READ FULL TEXT
research
05/19/2020

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

In this paper, we explore vector quantization for acoustic unit discover...
research
12/21/2019

Learning Representations by Maximizing Mutual Information in Variational Autoencoder

Variational autoencoders (VAE) have ushered in a new era of unsupervised...
research
08/05/2021

Applying the Information Bottleneck Principle to Prosodic Representation Learning

This paper describes a novel design of a neural network-based speech gen...
research
10/24/2021

Discrete acoustic space for an efficient sampling in neural text-to-speech

We present an SVQ-VAE architecture using a split vector quantizer for NT...
research
03/10/2021

Variable-rate discrete representation learning

Semantically meaningful information content in perceptual signals is usu...
research
05/29/2023

Autoencoding Conditional Neural Processes for Representation Learning

Conditional neural processes (CNPs) are a flexible and efficient family ...
research
05/11/2022

A deep representation learning speech enhancement method using β-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Baye...

Please sign up or login with your details

Forgot password? Click here to reset