Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

12/14/2020
by   Herman Kamper, et al.
0

We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units. Two segmentation methods are considered. In the first, features are greedily merged until a prespecified number of segments are reached. The second uses dynamic programming to optimize a squared error with a penalty term to encourage fewer but longer segments. We show that these VQ segmentation methods can be used without alteration across a wide range of tasks: unsupervised phone segmentation, ABX phone discrimination, same-different word discrimination, and as inputs to a symbolic word segmentation algorithm. The penalized method generally performs best. While results are only comparable to the state-of-the-art in some cases, in all tasks a reasonable competing approach is outperformed at a substantially lower bitrate.

READ FULL TEXT
research
02/24/2022

Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring

Recent work on unsupervised speech segmentation has used self-supervised...
research
10/05/2021

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Typically, unsupervised segmentation of speech into the phone and word-l...
research
06/15/2020

Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech

We investigate the effect of introducing phone, syllable, or word bounda...
research
06/30/2023

What do self-supervised speech models know about words?

Many self-supervised speech models (S3Ms) have been introduced over the ...
research
06/03/2020

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

Probabilistic Latent Variable Models (LVMs) provide an alternative to se...
research
10/12/2020

Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

In this paper, we present a data set and methods to compare speech proce...
research
06/14/2021

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

Self-supervised approaches for speech representation learning are challe...

Please sign up or login with your details

Forgot password? Click here to reset