Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

07/27/2020
by   Felix Kreuk, et al.
7

We propose a self-supervised representation learning model for the task of unsupervised phoneme boundary detection. The model is a convolutional neural network that operates directly on the raw waveform. It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle. At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries. As such, the proposed model is trained in a fully unsupervised manner with no manual annotations in the form of target boundaries nor phonetic transcriptions. We compare the proposed approach to several unsupervised baselines using both TIMIT and Buckeye corpora. Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets. Furthermore, we experimented with expanding the training set with additional examples from the Librispeech corpus. We evaluated the resulting model on distributions and languages that were not seen during the training phase (English, Hebrew and German) and showed that utilizing additional untranscribed data is beneficial for model performance.

READ FULL TEXT

page 2

page 4

research
11/17/2020

Self-supervised Document Clustering Based on BERT with Data Augment

Contrastive learning is a good way to pursue discriminative unsupervised...
research
07/02/2022

Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors

Symbolic music segmentation is the process of dividing symbolic melodies...
research
02/11/2020

Phoneme Boundary Detection using Learnable Segmental Features

Phoneme boundary detection plays an essential first step for a variety o...
research
11/15/2021

QK Iteration: A Self-Supervised Representation Learning Algorithm for Image Similarity

Self-supervised representation learning is a fundamental problem in comp...
research
10/05/2021

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Typically, unsupervised segmentation of speech into the phone and word-l...
research
06/03/2015

PeakSegJoint: fast supervised peak detection via joint segmentation of multiple count data samples

Joint peak detection is a central problem when comparing samples in geno...
research
06/03/2021

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

Automatic detection of phoneme or word-like units is one of the core obj...

Please sign up or login with your details

Forgot password? Click here to reset