An embedded segmental K-means model for unsupervised segmentation and clustering of speech

03/23/2017
by   Herman Kamper, et al.
0

Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing. Most approaches lie at methodological extremes: some use probabilistic Bayesian models with convergence guarantees, while others opt for more efficient heuristic techniques. Despite competitive performance in previous work, the full Bayesian approach is difficult to scale to large speech corpora. We introduce an approximation to a recent Bayesian model that still has a clear objective function but improves efficiency by using hard clustering and segmentation rather than full Bayesian inference. Like its Bayesian counterpart, this embedded segmental K-means model (ES-KMeans) represents arbitrary-length word segments as fixed-dimensional acoustic word embeddings. We first compare ES-KMeans to previous approaches on common English and Xitsonga data sets (5 and 2.5 hours of speech): ES-KMeans outperforms a leading heuristic method in word segmentation, giving similar scores to the Bayesian model while being 5 times faster with fewer hyperparameters. However, its clusters are less pure than those of the other models. We then show that ES-KMeans scales to larger corpora by applying it to the 5 languages of the Zero Resource Speech Challenge 2017 (up to 45 hours), where it performs competitively compared to the challenge baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2017

Unsupervised neural and Bayesian models for zero-resource speech processing

In settings where only unlabelled speech data is available, zero-resourc...
research
03/09/2016

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

In settings where only unlabelled speech data is available, speech techn...
research
06/22/2022

DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon

Finding word boundaries in continuous speech is challenging as there is ...
research
10/12/2020

The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

We present the Zero Resource Speech Challenge 2020, which aims at learni...
research
09/16/2019

Fast transcription of speech in low-resource languages

We present software that, in only a few hours, transcribes forty hours o...
research
06/08/2021

Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings

When documenting oral-languages, Unsupervised Word Segmentation (UWS) fr...
research
08/03/2020

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

Unsupervised spoken term discovery (UTD) aims at finding recurring segme...

Please sign up or login with your details

Forgot password? Click here to reset