An Optimized Signal Processing Pipeline for Syllable Detection and Speech Rate Estimation

03/07/2021
by   Kamini Sabu, et al.
0

Syllable detection is an important speech analysis task with applications in speech rate estimation, word segmentation, and automatic prosody detection. Based on the well understood acoustic correlates of speech articulation, it has been realized by local peak picking on a frequency-weighted energy contour that represents vowel sonority. While several of the analysis parameters are set based on known speech signal properties, the selection of the frequency-weighting coefficients and peak-picking threshold typically involves heuristics, raising the possibility of data-based optimisation. In this work, we consider the optimization of the parameters based on the direct minimization of naturally arising task-specific objective functions. The resulting non-convex cost function is minimized using a population-based search algorithm to achieve a performance that exceeds previously published performance results on the same corpus using a relatively low amount of labeled data. Further, the optimisation of system parameters on a different corpus is shown to result in an explainable change in the optimal values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Dereverberation using joint estimation of dry speech signal and acoustic system

The purpose of speech dereverberation is to remove quality-degrading eff...
research
09/23/2019

Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features

We present our system submission to the ASVspoof 2019 Challenge Physical...
research
08/03/2023

Versatile Time-Frequency Representations Realized by Convex Penalty on Magnitude Spectrogram

Sparse time-frequency (T-F) representations have been an important resea...
research
07/21/2020

Optimization of data-driven filterbank for automatic speaker verification

Most of the speech processing applications use triangular filters spaced...
research
04/26/2018

Detection of Glottal Closure Instants using Deep Dilated Convolutional Neural Networks

Glottal Closure Instants (GCIs) correspond to the temporal locations of ...
research
08/10/2022

Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source

Reverberations are unavoidable in enclosures, resulting in reduced intel...

Please sign up or login with your details

Forgot password? Click here to reset