Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording Environments

04/29/2020
by   Bhavik Vachhani, et al.
0

Phone level localization of mis-articulation is a key requirement for an automatic articulation error assessment system. A robust phone segmentation technique is essential to aid in real-time assessment of phone level mis-articulations of speech, wherein the audio is recorded on mobile phones or tablets. This is a non-standard recording set-up with little control over the quality of recording. We propose a novel post processing technique to aid Spectral Transition Measure(STM)-based phone segmentation under noisy conditions such as environment noise and clipping, commonly present during a mobile phone recording. A comparison of the performance of our approach and phone segmentation using traditional MFCC and PLPCC speech features for Gaussian noise and clipping is shown. The proposed approach was validated on TIMIT and Hindi speech corpus and was used to compute phone boundaries for a set of speech, recorded simultaneously on three devices - a laptop, a stationarily placed tablet and a handheld mobile phone, to simulate different audio qualities in a real-time non-standard recording environment. F-ratio was the metric used to compute the accuracy in phone boundary marking. Experimental results show an improvement of 7 baseline approach. Similar results were seen for the set of three of recordings collected in-house.

READ FULL TEXT

page 3

page 4

page 5

research
08/23/2019

VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries

In this paper, we propose a novel approach for accurate detection of the...
research
03/01/2021

Comparing acoustic analyses of speech data collected remotely

Face-to-face speech data collection has been next to impossible globally...
research
06/15/2020

Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech

We investigate the effect of introducing phone, syllable, or word bounda...
research
05/17/2023

Boosting Local Spectro-Temporal Features for Speech Analysis

We introduce the problem of phone classification in the context of speec...
research
03/17/2020

Segmentation and Optimal Region Selection of Physiological Signals using Deep Neural Networks and Combinatorial Optimization

Physiological signals, such as the electrocardiogram and the phonocardio...
research
09/30/2019

DiPCo – Dinner Party Corpus

We present a speech data corpus that simulates a "dinner party" scenario...
research
01/15/2022

Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

Current state of the art acoustic models can easily comprise more than 1...

Please sign up or login with your details

Forgot password? Click here to reset