DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon

06/22/2022
by   Robin Algayres, et al.
9

Finding word boundaries in continuous speech is challenging as there is little or no equivalent of a 'space' delimiter between words. Popular Bayesian non-parametric models for text segmentation use a Dirichlet process to jointly segment sentences and build a lexicon of word types. We introduce DP-Parse, which uses similar principles but only relies on an instance lexicon of word tokens, avoiding the clustering errors that arise with a lexicon of word types. On the Zero Resource Speech Benchmark 2017, our model sets a new speech segmentation state-of-the-art in 5 languages. The algorithm monotonically improves with better input representations, achieving yet higher scores when fed with weakly supervised inputs. Despite lacking a type lexicon, DP-Parse can be pipelined to a language model and learn semantic and syntactic representations as assessed by a new spoken word embedding benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech

We investigate the effect of introducing phone, syllable, or word bounda...
research
03/23/2017

An embedded segmental K-means model for unsupervised segmentation and clustering of speech

Unsupervised segmentation and clustering of unlabelled speech are core p...
research
10/06/2015

Language Segmentation

Language segmentation consists in finding the boundaries where one langu...
research
09/21/2021

On the Difficulty of Segmenting Words with Attention

Word segmentation, the problem of finding word boundaries in speech, is ...
research
06/07/2021

Weakly-supervised word-level pronunciation error detection in non-native English speech

We propose a weakly-supervised model for word-level mispronunciation det...
research
04/27/2022

Unsupervised Word Segmentation using K Nearest Neighbors

In this paper, we propose an unsupervised kNN-based approach for word se...
research
02/19/2021

Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

The prosody of a spoken word is determined by its surrounding context. I...

Please sign up or login with your details

Forgot password? Click here to reset