Landmark-based consonant voicing detection on multilingual corpora

11/10/2016
by   Xiang Kong, et al.
0

This paper tests the hypothesis that distinctive feature classifiers anchored at phonetic landmarks can be transferred cross-lingually without loss of accuracy. Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN). All detectors are trained on English data (TIMIT),and tested on English, Turkish, and Spanish (performance measured using F1 and accuracy). Experiments demonstrate that manual features outperform all MFCC classifiers, while CNNfeatures outperform both. MFCC-based classifiers suffer an F1reduction of 16 generalized from English to other languages. Manual features suffer only a 5 F1 reduction,and CNN features actually perform better in Turkish and Span-ish than in the training language, demonstrating that features capable of representing long-term spectral dynamics (CNN and landmark-based features) are able to generalize cross-lingually with little or no loss of accuracy

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2015

Facial Landmark Detection with Tweaked Convolutional Neural Networks

We present a novel convolutional neural network (CNN) design for facial ...
research
05/15/2018

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

Furui first demonstrated that the identity of both consonant and vowel c...
research
05/08/2020

A Detailed Look At CNN-based Approaches In Facial Landmark Detection

Facial landmark detection has been studied over decades. Numerous neural...
research
08/14/2020

Landmark detection in Cardiac Magnetic Resonance Imaging Using A Convolutional Neural Network

Purpose: To develop a convolutional neural network (CNN) solution for ro...
research
08/18/2016

Multilingual Modal Sense Classification using a Convolutional Neural Network

Modal sense classification (MSC) is a special WSD task that depends on t...
research
12/09/2021

Modelling Lips-State Detection Using CNN for Non-Verbal Communications

Vision-based deep learning models can be promising for speech-and-hearin...
research
07/10/2020

Miss the Point: Targeted Adversarial Attack on Multiple Landmark Detection

Recent methods in multiple landmark detection based on deep convolutiona...

Please sign up or login with your details

Forgot password? Click here to reset