Harmonic-aligned Frame Mask Based on Non-stationary Gabor Transform with Application to Content-dependent Speaker Comparison

04/23/2019
by   Feng Huang, et al.
0

We propose harmonic-aligned frame mask for speech signals using non-stationary Gabor transform (NSGT). A frame mask operates on the transfer coefficients of a signal and consequently converts the signal into a counterpart signal. It depicts the difference between the two signals. In preceding studies, frame masks based on regular Gabor transform were applied to single-note instrumental sound analysis. This study extends the frame mask approach to speech signals. For voiced speech, the fundamental frequency is usually changing consecutively over time. We employ NSGT with pitch-dependent and therefore time-varying frequency resolution to attain harmonic alignment in the transform domain and hence yield harmonic-aligned frame masks for speech signals. We propose to apply the harmonic-aligned frame mask to content-dependent speaker comparison. Frame masks, computed from voiced signals of a same vowel but from different speakers, were utilized as similarity measures to compare and distinguish the speaker identities (SID). Results obtained with deep neural networks demonstrate that the proposed frame mask is valid in representing speaker characteristics and shows a potential for SID applications in limited data scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2020

Blind Mask to Improve Intelligibility of Non-Stationary Noisy Speech

This letter proposes a novel blind acoustic mask (BAM) designed to adapt...
research
12/15/2021

Speech frame implementation for speech analysis and recognition

Distinctive features of the created speech frame are: the ability to tak...
research
04/17/2019

Deep Filtering: Signal Extraction Using Complex Time-Frequency Filters

Signal extraction from a single-channel mixture with additional undesire...
research
08/30/2022

HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

While neural network models are making significant progress in piano tra...
research
04/27/2018

Deep Speech Denoising with Vector Space Projections

We propose an algorithm to denoise speakers from a single microphone in ...
research
03/26/2023

Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

Time-domain speech enhancement (SE) has recently been intensively invest...
research
08/07/2020

Applying Speech Tempo-Derived Features, BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks

The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE) c...

Please sign up or login with your details

Forgot password? Click here to reset