Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

05/29/2020
by   Benjamin Milde, et al.
0

The Sparsespeech model is an unsupervised acoustic model that can generate discrete pseudo-labels for untranscribed speech. We extend the Sparsespeech model to allow for sampling over a random discrete variable, yielding pseudo-posteriorgrams. The degree of sparsity in this posteriorgram can be fully controlled after the model has been trained. We use the Gumbel-Softmax trick to approximately sample from a discrete distribution in the neural network and this allows us to train the network efficiently with standard backpropagation. The new and improved model is trained and evaluated on the Libri-Light corpus, a benchmark for ASR with limited or no supervision. The model is trained on 600h and 6000h of English read speech. We evaluate the improved model using the ABX error measure and a semi-supervised setting with 10h of transcribed speech. We observe a relative improvement of up to 31.4 ABX error rates across speakers on the test set with the improved Sparsespeech model on 600h of speech data and further improvements when we scale the model to 6000h.

READ FULL TEXT
research
07/31/2019

Personalizing ASR for Dysarthric and Accented Speech with Limited Data

Automatic speech recognition (ASR) systems have dramatically improved ov...
research
10/23/2018

Semi-supervised acoustic model training for speech with code-switching

In the FAME! project, we aim to develop an automatic speech recognition ...
research
06/16/2018

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

In this paper, we present our overall efforts to improve the performance...
research
06/20/2019

Semi-supervised acoustic model training for five-lingual code-switched ASR

This paper presents recent progress in the acoustic modelling of under-r...
research
09/19/2019

Self-Training for End-to-End Speech Recognition

We revisit self-training in the context of end-to-end speech recognition...
research
01/11/2023

Dual Learning for Large Vocabulary On-Device ASR

Dual learning is a paradigm for semi-supervised machine learning that se...
research
09/17/2023

A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers

Dysarthria is a speech disorder that hinders communication due to diffic...

Please sign up or login with your details

Forgot password? Click here to reset