Unsupervised Word Segmentation using K Nearest Neighbors

04/27/2022
by   Tzeviya Sylvia Fuchs, et al.
0

In this paper, we propose an unsupervised kNN-based approach for word segmentation in speech utterances. Our method relies on self-supervised pre-trained speech representations, and compares each audio segment of a given utterance to its K nearest neighbors within the training set. Our main assumption is that a segment containing more than one word would occur less often than a segment containing a single word. Our method does not require phoneme discovery and is able to operate directly on pre-trained audio representations. This is in contrast to current methods that use a two-stage approach; first detecting the phonemes in the utterance and then detecting word-boundaries according to statistics calculated on phoneme patterns. Experiments on two datasets demonstrate improved results over previous single-stage methods and competitive results on state-of-the-art two-stage methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

We introduce a simple neural encoder architecture that can be trained us...
research
02/24/2022

Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring

Recent work on unsupervised speech segmentation has used self-supervised...
research
03/30/2023

Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels

Unsupervised word segmentation in audio utterances is challenging as, in...
research
04/11/2022

The PartialSpoof Database and Countermeasures for the Detection of Short Generated Audio Segments Embedded in a Speech Utterance

Automatic speaker verification is susceptible to various manipulations a...
research
11/04/2018

Towards Unsupervised Speech-to-Text Translation

We present a framework for building speech-to-text translation (ST) syst...
research
08/31/2023

Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition

Speech Emotion Recognition (SER) is a challenging task due to limited da...
research
06/22/2022

DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon

Finding word boundaries in continuous speech is challenging as there is ...

Please sign up or login with your details

Forgot password? Click here to reset