Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

04/02/2021
by   Siyuan Feng, et al.
0

This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data. Past studies usually proposed single-step approaches. We propose a two-stage approach: the first stage learns a subword-discriminative feature representation and the second stage applies clustering to the learned representation and obtains phone-like clusters as the discovered acoustic units. In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent. In the second stage, segment-level k-means is adopted, and two methods to represent the variable-length speech segments as fixed-dimension feature vectors are compared. Experiments on a very low-resource Mboshi language corpus show that our approach outperforms state-of-the-art AUD in both normalized mutual information (NMI) and F-score. The multilingual ASR improved upon the monolingual ASR in providing OOD phone labels and in estimating the phone boundaries. A comparison of our systems with and without knowing the ground-truth phone boundaries showed a 16 the current approach can significantly benefit from improved phone boundary estimation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2020

Exploiting Cross-Lingual Knowledge in Unsupervised Acoustic Modeling for Low-Resource Languages

(Short version of Abstract) This thesis describes an investigation on un...
research
10/13/2015

A language model based approach towards large scale and lightweight language identification systems

Multilingual spoken dialogue systems have gained prominence in the recen...
research
02/26/2020

Universal Phone Recognition with a Multilingual Allophone System

Multilingual models can improve language processing, particularly for lo...
research
11/13/2017

Phonemic and Graphemic Multilingual CTC Based Speech Recognition

Training automatic speech recognition (ASR) systems requires large amoun...
research
11/03/2020

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

The present study tackles the problem of automatically discovering spoke...
research
08/23/2019

VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries

In this paper, we propose a novel approach for accurate detection of the...
research
07/25/2020

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

This study addresses unsupervised subword modeling, i.e., learning featu...

Please sign up or login with your details

Forgot password? Click here to reset