Combination of Time-domain, Frequency-domain, and Cepstral-domain Acoustic Features for Speech Commands Classification

03/30/2022
by   Yikang Wang, et al.
0

In speech-related classification tasks, frequency-domain acoustic features such as logarithmic Mel-filter bank coefficients (FBANK) and cepstral-domain acoustic features such as Mel-frequency cepstral coefficients (MFCC) are often used. However, time-domain features perform more effectively in some sound classification tasks which contain non-vocal or weakly speech-related sounds. We previously proposed a feature called bit sequence representation (BSR), which is a time-domain binary acoustic feature based on the raw waveform. Compared with MFCC, BSR performed better in environmental sound detection and showed comparable accuracy performance in limited-vocabulary speech recognition tasks. In this paper, we propose a novel improvement BSR feature called BSR-float16 to represent floating-point values more precisely. We experimentally demonstrated the complementarity among time-domain, frequency-domain, and cepstral-domain features using a dataset called Speech Commands proposed by Google. Therefore, we used a simple back-end score fusion method to improve the final classification accuracy. The fusion results also showed better noise robustness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2020

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Tacotron-based text-to-speech (TTS) systems directly synthesize speech f...
research
02/05/2021

Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds

This paper proposes new acoustic feature signatures based on the multisc...
research
10/23/2021

A Study of Acoustic Features in Arabic Speaker Identification under Noisy Environmental Conditions

One of the major parts of the voice recognition field is the choice of a...
research
10/21/2022

Adaptive re-calibration of channel-wise features for Adversarial Audio Classification

DeepFake Audio, unlike DeepFake images and videos, has been relatively l...
research
07/23/2021

Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

We propose a multi-channel speech enhancement approach with a novel two-...
research
12/24/2013

Speech Recognition Front End Without Information Loss

Speech representation and modelling in high-dimensional spaces of acoust...
research
08/16/2019

Sub-Spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion

Environmental Sound Classification (ESC) is an important and challenging...

Please sign up or login with your details

Forgot password? Click here to reset