Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

11/23/2018
by   Isidoros Rodomagoulakis, et al.
0

Frequency modulation features capture the fine structure of speech formants that constitute beneficial and supplementary to the traditional energy-based cepstral features. Improvements have been demonstrated mainly in GMM-HMM systems for small and large vocabulary tasks. Yet, they have limited applications in DNN-HMM systems and Distant Speech Recognition (DSR) tasks. Herein, we elaborate on their integration within state-of-the-art front-end schemes that include post-processing of MFCCs resulting in discriminant and speaker adapted features of large temporal contexts. We explore 1) multichannel demodulation schemes for multi-microphone setups, 2) richer descriptors of frequency modulations, and 3) feature transformation and combination via hierarchical deep networks. We present results for tandem and hybrid recognition with GMM and DNN acoustic models, respectively. The improved modulation features are combined efficiently with MFCCs yielding modest and consistent improvements in multichannel distant speech recognition tasks on reverberant and noisy environments, where recognition rates are far from human performance.

READ FULL TEXT
research
10/15/2019

Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition

Despite significant efforts over the last few years to build a robust au...
research
06/17/2019

On combining features for single-channel robust speech recognition in reverberant environments

This paper addresses the combination of complementary parallel speech re...
research
02/14/2017

On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition

Previous studies support the idea of merging auditory-based Gabor featur...
research
05/26/2018

Automatic context window composition for distant speech recognition

Distant speech recognition is being revolutionized by deep learning, tha...
research
01/14/2023

Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope

The syllable is a perceptually salient unit in speech. Since both the sy...
research
02/23/2016

The IBM 2016 Speaker Recognition System

In this paper we describe the recent advancements made in the IBM i-vect...
research
07/28/2020

Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition

For many small- and medium-vocabulary tasks, audio-visual speech recogni...

Please sign up or login with your details

Forgot password? Click here to reset