Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

05/15/2018
by   Di He, et al.
0

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental. Acoustic landmarks are perceptually salient, even in a language one doesn't speak, and it has been demonstrated that non-speakers of the language can identify features such as the primary articulator of the landmark. These factors suggest a strategy for developing language-independent automatic speech recognition: landmarks can potentially be learned once from a suitably labeled corpus and rapidly applied to many other languages. This paper proposes enhancing the cross-lingual portability of a neural network by using landmarks as the secondary task in multi-task learning (MTL). The network is trained in a well-resourced source language with both phone and landmark labels (English), then adapted to an under-resourced target language with only word labels (Iban). Landmark-tasked MTL reduces source-language phone error rate by 2.9 word error rate by 1.9 training data. These results suggest that landmark-tasked MTL causes the DNN to learn hidden-node features that are useful for cross-lingual adaptation.

READ FULL TEXT
research
10/01/2021

Self-supervised Secondary Landmark Detection via 3D Representation Learning

Recent technological developments have spurred great advances in the com...
research
12/13/2016

Performance Improvements of Probabilistic Transcript-adapted ASR with Recurrent Neural Network and Language-specific Constraints

Mismatched transcriptions have been proposed as a mean to acquire probab...
research
11/10/2016

Landmark-based consonant voicing detection on multilingual corpora

This paper tests the hypothesis that distinctive feature classifiers anc...
research
10/27/2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames

Most mainstream Automatic Speech Recognition (ASR) systems consider all ...
research
11/05/2018

When CTC Training Meets Acoustic Landmarks

Connectionist temporal classification (CTC) training criterion provides ...
research
06/15/2022

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

Articulatory features are inherently invariant to acoustic signal distor...

Please sign up or login with your details

Forgot password? Click here to reset