Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

07/26/2020
by   Saurabhchand Bhati, et al.
0

Unsupervised spoken term discovery consists of two tasks: finding the acoustic segment boundaries and labeling acoustically similar segments with the same labels. We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments. Therefore, for strong segmentation performance, it is crucial that the features represent the phonetic properties of a frame more than other factors of variability. We achieve this via a self-expressing autoencoder framework. It consists of a single encoder and two decoders with shared weights. The encoder projects the input features into a latent representation. One of the decoders tries to reconstruct the input from these latent representations and the other from the self-expressed version of them. We use the obtained features to segment and cluster the speech data. We evaluate the performance of the proposed method in the Zero Resource 2020 challenge unit discovery task. The proposed system consistently outperforms the baseline, demonstrating the usefulness of the method in learning representations.

READ FULL TEXT
research
11/01/2018

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

We investigate unsupervised models that can map a variable-duration spee...
research
10/27/2022

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

Recent progress in self-supervised or unsupervised machine learning has ...
research
12/14/2020

A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings

Many speech processing tasks involve measuring the acoustic similarity b...
research
11/28/2020

Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks

Spoken term discovery from untranscribed speech audio could be achieved ...
research
08/07/2018

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

While Word2Vec represents words (in text) as vectors carrying semantic i...
research
08/03/2020

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

Unsupervised spoken term discovery (UTD) aims at finding recurring segme...
research
08/21/2023

Implicit Self-supervised Language Representation for Spoken Language Diarization

In a code-switched (CS) scenario, the use of spoken language diarization...

Please sign up or login with your details

Forgot password? Click here to reset