Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021

07/13/2021
by   Takashi Maekaku, et al.
0

We present a system for the Zero Resource Speech Challenge 2021, which combines a Contrastive Predictive Coding (CPC) with deep cluster. In deep cluster, we first prepare pseudo-labels obtained by clustering the outputs of a CPC network with k-means. Then, we train an additional autoregressive model to classify the previously obtained pseudo-labels in a supervised manner. Phoneme discriminative representation is achieved by executing the second-round clustering with the outputs of the final layer of the autoregressive model. We show that replacing a Transformer layer with a Conformer layer leads to a further gain in a lexical metric. Experimental results show that a relative improvement of 35 in a syntactic metric are achieved compared to a baseline method of CPC-small which is trained on LibriSpeech 460h data. We achieve top results in this challenge with the syntactic metric.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2021

The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling

We present the Zero Resource Speech Challenge 2021, which asks participa...
research
06/01/2022

DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks

Deep clustering has recently emerged as a promising technique for comple...
research
11/01/2020

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

Self-supervised speech representations have been shown to be effective i...
research
11/23/2021

Exploring Non-Contrastive Representation Learning for Deep Clustering

Existing deep clustering methods rely on contrastive learning for repres...
research
01/28/2022

Hybrid Contrastive Learning with Cluster Ensemble for Unsupervised Person Re-identification

Unsupervised person re-identification (ReID) aims to match a query image...
research
07/14/2023

Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications

The representation learning of speech, without textual resources, is an ...

Please sign up or login with your details

Forgot password? Click here to reset