Temporal Convolution Network Based Onset Detection and Query by Humming System Design

05/09/2023
by   Yu Cheng Hung, et al.
0

Onsets are a key factor to split audio into several notes. In this paper, we ensemble multiple temporal convolution network (TCN) based model and utilize a restricted frequency range spectrogram to achieve more robust onset detection. Different from the present onset detection of QBH system which is only available in a clean scenario, our proposal of onset detection and speech enhancement can prevent noise from affecting onset detection function (ODF). Compared to the CNN model which exploits spatial features of the spectrogram, the TCN model exploits both spatial and temporal features of the spectrogram. As the usage of QBH in noisy scenarios, we apply the TCN-based speech enhancement as a preprocessor of QBH. With the combinations of TCN-based speech enhancement and onset detection, simulations show that the proposal can enable the QBH system in both noisy and clean circumstances with short response time.

READ FULL TEXT
research
09/14/2023

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

Speech enhancement systems are typically trained using pairs of clean an...
research
02/13/2021

Multi-Channel Speech Enhancement using Graph Neural Networks

Multi-channel speech enhancement aims to extract clean speech from a noi...
research
11/20/2022

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

Audio-visual speech enhancement aims to extract clean speech from a nois...
research
09/13/2019

Spoken Speech Enhancement using EEG

In this paper we demonstrate spoken speech enhancement using electroence...
research
09/10/2019

Generative Speech Enhancement Based on Cloned Networks

We propose to implement speech enhancement by the regeneration of clean ...
research
06/13/2018

Model-based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids

Speech intelligibility is often severely degraded among hearing impaired...
research
08/28/2018

Contextual Audio-Visual Switching For Speech Enhancement in Real-World Environments

Human speech processing is inherently multimodal, where visual cues (lip...

Please sign up or login with your details

Forgot password? Click here to reset