Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

03/09/2016
by   Herman Kamper, et al.
0

In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. A similar problem is faced when modelling infant language acquisition. In these cases, categorical linguistic structure needs to be discovered directly from speech audio. We present a novel unsupervised Bayesian model that segments unlabelled speech and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional acoustic vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this space while jointly performing segmentation. We report word error rates in a small-vocabulary connected digit recognition task by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20 system by about 10 does not require a pre-specified vocabulary size.

READ FULL TEXT

page 8

page 11

research
06/22/2016

A segmental framework for fully-unsupervised large-vocabulary speech recognition

Zero-resource speech technology is a growing research area that aims to ...
research
11/01/2018

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

We investigate unsupervised models that can map a variable-duration spee...
research
03/23/2017

An embedded segmental K-means model for unsupervised segmentation and clustering of speech

Unsupervised segmentation and clustering of unlabelled speech are core p...
research
01/03/2017

Unsupervised neural and Bayesian models for zero-resource speech processing

In settings where only unlabelled speech data is available, zero-resourc...
research
07/01/2020

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...
research
12/03/2020

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

We propose a new unsupervised model for mapping a variable-duration spee...

Please sign up or login with your details

Forgot password? Click here to reset